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Chapter  1 INTRODUCTION 


The  discrimination  problem  may  be  formulated  as  follows . 

The  statistician  collects  data  (X, , 8,) , . . . , (X  ,8  ) , a sequence  of 

11  n n 

independent  Identically  distributed  (lid)  random  vectors  drawn  from  the 

distribution  of  (X,  8),  a random  vector  independent  of  the  data.  For 

each  1 <1  <n,  the  observation  X.  takes  values  in  F and  its  state  8. 

l m i 

takes  values  in  {1, . . . ,M).  The  discrimination  problem  is  that  of 
estimating  the  state  6 from  the  data  and  the  observation  X using  pro- 
cedures which  do  not  require  complete  knowledge  of  the  distribution 
of  (X,  8).  If  8 denotes  the  estimate,  then  a measure  of  the  performance 

of  the  procedure,  given  the  data  V = (X, , 0, , . . . ,X  , 8 ) , is 

n l i n n 

Ln=P{8  / ®|vnl*  1116  conditional  probability  of  error. 

If  the  distribution  of  (X,  ®)  is  known,  then  the  results  of  statisti- 

A A 

cal  decision  theory  show  how  to  choose  8,  given  X,  such  that  P{e  ^ 0} 

Is  minimal.  The  minimal  value  L*  is  the  Bayes  probability  of  error  and 

we  know  that  L 2 L*  for  all  methods  of  obtaining  6 from  X and  V . A 
n n 

A 

discrimination  rule . i.e. , a sequence  of  methods  for  estimating  0 from 
X and  V , is  said  to  be  asymptotically  optimal  if  ^ L*  in  probability. 
The  importance  of  asymptotically  optimal  discrimination  rules  is  that, 
for  the  statistician,  such  rules  are  the  only  ones  guaranteeing  that  L 

n 

is  close  to  L*  provided  he  collects  enough  data.  One  of  our  objectives 
is  to  show  how  to  construct  asymptotically  optimal  discrimination  rules 
and  to  display  several  techniques  to  prove,  or  disprove,  the  asymptotic 
optimality  of  discrimination  rules  for  certain  large  classes  of  distribu- 
tions of  (X , 0) . 

If  the  statistician  has  a fair  amount  of  a priori  knowledge  about 
the  distribution  of  (X,  0),  then  he  may  be  able  to  construct  a parametric 


1 


2 


model  for  the  distribution  of  (X,  0),  determine  the  parameters  in  the  model 
that  best  fit  the  data  and  use  this  particular  version  of  the  model  with 

A 

X to  obtain  an  estimate  6 of  0.  However,  if  the  model  is  not  exact, 
then  it  is  usually  impossible  to  construct  an  asymptotically  optimal 
discrimination  rule  in  this  manner.  In  the  absence  of  sufficient  know- 
ledge about  the  distribution  of  (X,  0)  to  use  such  a parametric  model, 
is  it  still  possible  to  construct  an  asymptotically  optimal  discrimination 
rule?  Under  some  mild  conditions  on  the  distribution  of  (X,  0) , the 
answer  is  affirmative.  In  Chapters  2 and  5 a rather  detailed  study  is 
made  of  the  asymptotic  properties  of  some  of  the  most  popular  rules, 
including  some  original  ones.  Rules  that  are  discussed  include  a gene- 
ralized version  of  the  celebrated  k-nearest  neighbor  rule,  two-step  rules 
using  Parzen-Rosenblatt  density  estimates  of  Loftsgaarden-Quesenberry 
density  estimates,  and  histogram  type  discrimination  rules.  Of  course 
we  are  bound  to  repeat  some  results  that  can  be  found  elsewhere  in  the 
literature.  However,  to  the  best  of  the  author's  knowledge,  the  litera- 
ture lacks  a technical  paper  or  book  that  systematically  and  rigorously 
treats  asymptotic  optimality  and  related  problems  for  nonparametric 
discrimination  rules. 

If  the  distribution  of  X is  absolutely  continuous  with  respect  to 
Lebesgue  measure,  then  it  is  shown  in  Chapter  2 how  nonparametric 
density  estimates  can  be  employed  to  construct  an  asymptotically  optimal 
discrimination  rule.  In  Chapter  4 two  popular  density  estimates,  the 
Parzen-Rosenblatt  estimate  and  the  Loftsgaarden-Quesenberry  estimate, 
are  studied.  In  particular,  conditions  are  obtained  insuring  the  point- 
wise  convergence  and  the  convergence  in  Lr  of  these  estimates , that 
are  weaker  than  the  conditions  Imposed  by  several  authors  over  the 
past  decade.  In  Chapter  3,  some  inequalities  are  developed  concerning 
the  uniform  convergence  of  empirical  measures  in  Rm.  These  inequalities 


3 

are  strong  enough  to  prove  the  uniform  convergence  of  the  Parzen- 
Rosenblatt  and  Loftsgaarden-Quesenberry  density  estimates  under  the 
weakest  conditions  to  date. 

When  the  statistician  has  selected  a discrimination  rule  and  has 
collected  data,  he  wants  to  know  how  his  rule  performs,  that  is,  he 

A 

would  like  to  compute  = P {0  ^ 6 | V^}.  Of  course,  there  is  no  way  of 

computing  L since  the  distribution  of  (X,  0)  is  unknown.  The  statistician 
n 

is  thus  forced  to  estimate  L from  the  data.  It  is  shown  in  Chapter  6 

n 

that  for  most  of  the  classical  discrimination  rules  and  nearly  all  the 

rules  that  are  discussed  in  this  paper,  there  exists  a natural  useful 

error  estimate  of  Ln>  We  display  upper-bounds  for  P{  | Ln~Ln  | s e] 

(e  > 0)  that  do  not  depend  upon  the  distribution  of  (X,  0) . These  bounds 

enable  the  statistician  to  compute  how  much  confidence  he  can  put  in 

his  estimate  L even  if  he  has  no  idea  what  the  distribution  of  (X,  0) 
n 

looks  like.  Error  estimates  that  are  discussed  include  the  resubstitution 
estimate,  the  deleted  estimate  and  the  holdout  estimate. 

The  dissertation  is  organized  as  follows.  Chapters  2 and  5 
treat  the  asymptotic  optimality  of  nonparametric  discrimination  rules. 

To  be  able  to  read  Chapter  5 , a few  results  from  Chapters  3 and  4 are 
needed.  Chapters  3 and  4 on  empirical  measures  and  density  estimation 
can  be  read  separately.  Chapter  6 on  distribution-free  error  estimation, 
which  has  the  highest  concentration  of  new  theorems , can  be  read 
separately  too  provided  that  the  reader  is  willing  to  have  a quick  look 
at  Chapter  2 for  the  definitions  of  some  symbols.  In  Chapter  6 we  have 
tried  to  provide  a sound  theoretical  framework  for  further  research  in 
the  relatively  young  area  of  error  estimation.  The  proofs  pertaining  to 
the  theorems  of  a given  chapter  are  gathered  in  an  appendix  at  the  end 
of  each  chapter.  The  bibliography  is  far  from  exhaustive.  However, 
for  each  subject,  we  have  tried  to  list  the  leading  theoretical  papers. 
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P 

at  least  one  survey  paper  or  book,  and  most  of  the  technical  articles 
in  which  alternative  proofs  or  closely  related  theorems  can  be  found. 


r 
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Chapter  2 DISCRIMINATION 
2 . 1 Introduction 

A statistician  observes  a random  vector  X,  taking  values  in  Rm, 

and  wishes  to  estimate  its  state  6 taking  values  in  {1 , . . . ,M  } . To  do 

so  he  has  collected  data  (X, , 0, ),...,  (X  ,0  ) , a sequence  of  independent, 

11  n n 

identically  distributed  random  vectors  distributed  as  (X,  e)  where 

(a)  P{  0 * j ] = TTj  , 1 s j s M;  and 

(2.1) 

(b)  given  that  0 = J , X has  probability  measure  on 

the  Borel  sets  of  Rm. 


Using  (2.1)  we  see  that  the  probability  measure  for  X is 
M 


= "i“)  • 


We  assume  that  (X,  0)  is  independent  of  the  data. 

A discrimination  rule,  or  simply  rule,  is  a sequence  { 6 ) of 

decision  functions  where  6 = (6  _ , . . . , 6 . .)  is  a Borel  measurable 

n nl  nM 

mapping  from  (Rm  x { 1 , . . . ,M  })n  x Rm  to  to,  l]M  with  the  property 
that 


(2.2) 


If  V = (X  , 0 ) (X  , 0 ) denotes  the  data  sequence  and  X is  any 

n 1 1 n n o 

random  vector  taking  values  in  Rm , then  let  0y  x be  a random  variable 

n'  o 

taking  values  in  { 1 , . . . , M ] whose  distribution  is  determined  by  the 

joint  distribution  of  V and  X and  by 

no 


J 


5 


6 


p'9v  ,x  -'V.!- 

n o 


VVV 


1 * J i M . 


The  statistician  estimates  9 by  9..  Y . For  each  n we  define  the 

n 

local  conditional  probability  of  error  L , the  conditional  probability 
_____  - ” n , a 

of  error  L and  the  Drobability  of  error  R by 
n n 


Ln.X-pf9V  ,x’'9IVXl 

n 


L 

n 


*pfv 

•Ptv 


-E(t„e(Vn.X)|v„.X) 

^ *EfLn.X|Vn’ 

^ 9}  = E{Ln) 


(2.3) 

(2.4) 

(2.5) 


where  we  used  the  smoothing  property  for  conditional  expectations  in 
(2.4).  From  the  strong  law  of  large  numbers , is  the  limiting  frequency 
of  errors  when  a large  number  of  independent  observations , all  distri- 
buted as  (X,  9),  have  their  states  estimated  using  6^  and  V . 

n n 

Assume  for  the  moment  that  the  statistician  knows  the  distribution 

(2.1).  In  that  case  he  can  estimate  9 from  X in  the  following  fashion. 

in 

Let  6 = (6, , . . . , 6W)  be  a Borel  measurable  mapping  from  1R  to  [0 , l]M 
1 M 

such  that 


yx)  - 1 


x € F 


m 


and  let  9y,  his  estimate  of  9 , be  a random  variable  taking  values  in 
{ 1 , . . . , M ) whose  distribution  Is  determined  by  the  distribution  of  X 
and 


p ( ex  * j | x}  * a (X)  . i * \ * m . 
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H 


The  knowledge  of  (2 . 1)  Is  sufficient  to  enable  the  statistician  to  find 

a 6 for  which  P { eY  / 0]  , the  probability  of  error  with  £,  is  minimal 

among  all  such  Borel  measurable  mappings  from  Rm  to  [0,1].  He 

could  proceed  as  follows.  He  knows  that  there  exist  Borel  measurable 

functions  p, , . . . ,pw  from  lRm  to  [0,1]  such  that 
1 M 


Pj  (X)  = P{0  = J | X]  , UjsM,  ae(u) 


(2.6) 


where  ae(p)  denotes  almost  everywhere  with  respect  to  the  measure  y. 

We  note  that  p, , . . . ,p_ . are  unique  up  to  a a-null  set  and  therefore, 

1 M 

Pj(X)  = 1 ae(u)  . 

IT1 

Let  6*  = (6? , . . . , be  a Borel  measurable  mapping  from  R to  [0,1] 
1 M 

such  that 


6*(x)  = 1 , x € R 


6*(x)  = 0 whenever  p.(x)  < Max  p.  (x)  , 
J 1 l*k*M 


(2.7) 


lijiM,x(  R"‘  . (2.8) 

It  is  clear  that  6*  is  not  uniquely  defined  because  6*  depends  upon 

the  version  (p, , . . . ,pw)  that  is  used  in  (2.6) . Let  0*  be  the  corresponding 

estimate  of  0,  and  let  L*  » P{  e£  I1  0}  . It  is  worth  noting  that  L*  is 

well-defined,  that  is  L*, depends  upon  the  distribution  (2.1)  but  not  on 

the  version  (p, , . . . ,pw)  that  is  used  in  the  definition  of  6* . It  is  not 
1 M ^ 1^1 

hard  to  see  that  6*  is  the  best  possible  mapping  from  R to  [0,  l] 

satisfying  (2.7)  because,  for  any  other  (, 


8 


p{  ex ^ e|x}  s P{9*  / e|x}  ae(u) 


(2.9) 


and 

P{9X/  9}  i P£  0*  ^ 6}  = L*  . 

Moreover,  for  any  decision  function  6 , with  probability  one  , 

n 

Ln  X * Pf  9X^  ®IX}  ae(u)  (2.10) 

and 

Ln  ^ = L*  * 

The  proofs  of  (2.9)  and  (2.10)  are  given  ln  section  2.4.  The  quantity 

L*  is  usually  referred  to  as  the  Bayes  probability  of  error  and  any  Borel 

measurable  mapping  6 = ( 6^ , . . . , 6^)  from  F to  [0 , l]  satisfying  (2 . 7) 

for  which  the  probability  of  error  with  6 is  L* , is  called  a Bayes  decision 

function.  In  particular  6*  is  a Bayes  decision  function. 

The  question  that  naturally  arises  is,  does  L converge  to  L*  ln 

n 

some  probabilistic  sense  as  n tends  to  infinity  and  how  fast  does  it 
converge  ? We  say  that  a rule  { 6r  } is  asymptotically  optimal  if 

L 3l*  in  probability  . (2.11) 

n 

The  next  theorem  deals  with  the  connection  between  the  convergence  of 

6 to  6*  and  that  of  L to  L*  . Theorem  2 . 1 essentially  implies  that  to 
n n 

construct  an  asymptotically  optimal  rule,  the  statistician  should  be 
looking  for  decision  functions  that  approximate  the  unknown  Bayes 
decision  function  6* . 

Theorem  2.1.  If  for  all  1 < J < M and  all  x € B where  B is  a Borel  set 
from  Fm  with  ^(B)  = 1 , 


9 


either  p (x)  » Max  p (x) 

] liisM 


or  fi  .(V  ,x)  5 o in  probability  (wpl) 

— n)  n 


(where  p, , . . . ,p. ,)  is  a given  version  of  (2.6)) , then  L 5 L*  in  probability 
1 M n 

(wpl). 


Let  M and  m be  fixed  throughout  this  dissertation.  Three  Interesting 
classes  of  discrimination  problems  will  be  studied  in  more  detail . If 
l*  »•  • • » are  absolutely  continuous  with  respect  to  the  Lebesgue 

measure  in  Rm , we  will  call  it  a type  C x discrimination  problem . In 
that  case , there  are  densities  f ^ , . . . , f^  corresponding  to  * • • • » 

such  that  for  all  Borel  sets  B 9 Rm 


, 1<  i < M . 


The  fj  are  the  Radon-Nikodym  derivatives  of  the  y,.  with  respect  to  the 
Lebesgue  measure  in  Rm.  Clearly,  every  f is  determined  up  to  a 
i^-null  set. 

If  there  exists  a countable  set  of  points,  say  B,  such  that 
Ul (B)  = . . . = n M(B)  =*  1 , then  we  say  that  (2.1)  defines  a type  C 2 
discrimination  problem.  A type  C3  discrimination  problem  will  occur 
if  there  exist  ^-almost  everywhere  continuous  versions  p^, . . . ,pM  in 
(2.6).  It  turns  out,  as  we  will  see  in  Chapter  5,  that  it  is  very  easy 
to  devise  asymptotically  optimal  decision  rules  for  this  class  of  problems . 

In  the  next  two  sections  we  will  demonstrate  how  the  problem 
of  the  development  of  asymptotically  optimal  rules  { 6n)  for  tyPe  C ^ 
and  type  C 9 discrimination  problems  can  be  reduced  to  the  problem  of 
estimating  densities  and  measures. 
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2.2  Type  c 1 Discrimination  Problems 

Assume  that  to  all  the  j*  in  (2.1)  correspond  densities  , 

IsisM.that  M=1  and  that  the  statistician  has  a way  of  estimating, 

for  ail  xgRm,  fj(x)  from  Vr.  Let  fJn  be  a Borel  measurable  mapping 

from  RmnxRm  to  R,that  is,f,  is  a Borel  measurable  function  of 

In 


X,  ,...,X  and  x.We  say  that  the  sequence  of  estimates  ff,  1 of 
In  1 lnJ 

f^  is  weakly  (strongly)  consistent  on  B (where  B is  a Borel  set  from 
Rm)  if 


fln(x) 


n 


fj(x)  in  probability  (wpl) , all  xgB.  (2.12) 


If  M>l,ltls  only  natural  to  estimate  the  f^  using  only  those  (X^G^) 

from  V for  which  B=J .There fore, define  the  random  variables  N,  ,.. 
n l m 

“'NMn  by 


n 


,liJ«M  , 


(2.13) 


so  that  N,  is  the  number  of  observations  in  V that  have  state  j . It 
Jn  n 

is  clear  that 


N,  = n ,all  n. 
Jn 


If  we  are  going  to  use  the  f 


JN 


Jn 


(2.14) 


as  estimates  of  f^,lsj<M,  we  can 


ask  ourselves  if  the  sequences  (f^  } inherit  the  nice  consistency 


Jn 


properties  given  in  (2. 12). This  question  will  be  treated  further  on. 

If  f«^  tt  f , then  it  is  not  hard  to  see  that  a version 
J-l  J J 


(Pj PM)  of  (2.6)  is  given  by 


p.w  - r>Vx,/,(x) 

} lo  , if  f(x)«  0 


(2.15) 


■ 
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f 


From  (2.7)  ,(2.8) , we  know  that  if  6*=(6*  , . . . , 6* ,)  Is  a Borel 
measurable  mapping  from  F to  [0,1]  satisfying  (2.7)  and 


6*(x)  * 0 if*f(x)<Max  n.f.(x)  ,1*J*M, 
1 liliM  11 


(2.16) 


then  6*  is  a Bayes  decision  function  for  the  discrimination  problem 

defined  by  (2.1). To  obtain  a decision  function  that  is  close  to  fi*, we 

could  thus  try  to  estimate  the  and  use  these  estimates  in  (2.16). 

We  will  of  course  use  the  density  estimates  f ,l*j«M,  to  approxi- 

J Jn 

mate  the  f.We  estimate  the  n.  by  n,  , 1<J*M, where 
J J Jn 

yt.  = N.  /n  ,1*J*M. 
jn  jn 

Let  6 be  a decision  function  satisfying 
n 

■ 0 lf  n,.f.v  (x)<Max  TT.  f1M  (x)  , 

* n JnJNJn  ltliM  lniNin 

lsJ*M,x(Rm. 


(2.17) 


(2.18) 


We  show  that  the  following  is  true. 

Theorem  2.2.  Let  {f^}  be  a weakly  (strongly)  consistent  sequence 

of  estimates  of  f^  on  B for  all  l*J*M,and  let  u(B)sl.If  {&n}  is  a 

rule  satisfying  (2. 18), then  L " L*  in  probability  (wpl). 

n 

Theorem  2.2  states  that  if  we  have  a sequence  of  estimates 

of  fj  (l*JsM)  that  is  weakly  consistent  ae(^),then  we  can  construct 

an  asymptotically  optimal  decision  rule  {(  } .One  way  of  constructing 

n 

such  a rule  is  as  in  (2. 17), (2. 18). We  will  refer  to  decision  rules  of 
this  type  as  two-step  rules. 

Several  types  of  weakly  and  strongly  consistent  sequences  of 
density  estimates  are  studied  in  chapter  4. In  chapter  5, the  corres- 
ponding two-step  rules  are  discussed  and  some  other  decision  rules 
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are  developed  that  are  asymptotically  optimal  for  all  type  dis- 
crimination problems , that  is, the  class  of  problems  for  which  there 
exist  n -almost  everywhere  continuous  densities  f^l<i<M.We  note 
here  that  type  <*'  problems  are  necessarily  type  c.  and  type  (» 

* 1 v 

problems  but  that  every  problem  that  is  type  Cj  and  type  C3  is  not 
in  general  a type  problem. 

2.3  Type  Discrimination  Problems 


Let  there  exist  a set 
00 

B ~ U {xk} 

k=l, 

with  n(B)=l.  Every  ^ is  with  probability  one  characterizable  by  the 
probability  distribution  (m  ,m  where 

) 1 JZ 


' t • • • * 


mjk  = P{X=xk|  e=j}  #l<jiM,k=l ,2, 

Let  mk=P(X=xkJ  ,k=l  ,2, . . . and  note  that 

fM 

p c e=J } p { X=xk  I e=J } =X)  "jmjk 


(2.19) 


J=1 


j=l 
k=l,2, . . . 


(2.20) 


Further, any  Borel  measurable  function  p^:Rm-»[0,l]  coinciding  on  B 
with  (2.21)  is  a version  of  (2.6)  : 

p)  {xk)=Pt  6=J  I X=xk^ 


(njmjk/mk  ifmk>0  , l«JsM,k=l,2 (2.21) 

1 0 if  m.  « 0 


From  section  2.1  we  know  that  any  Borel  measurable  function 


6* “(6*  , . . . , 6*  ) from  Rm  to  [0,l]  satisfying  (2.7)  and  for  which 
1 M 


fl*(Xk)-°  if  TTjn>jk<  Max  r^m^  ,l*J*M,k*l,2, . . . 


1*1«M 


(2.22) 
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Is  a Bayes  decision  function  for  the  discrimination  problem 
defined  by  (2.1). Our  aim  is  to  replace  the  n.m  in  (2.22)  by 

J S'* 

suitable  estimates  so  that  the  newly  obtained  decision  rule  is 
asymptotically  optimal. 

Let  N 


In' 


Nw  be  the  number  of  occurrences  of  the  states 
Mn 


1,...,M  in  the  data  (X  , 6 J, . . . , (X  , 9 ) (see  (2. 13))  .Define 

l i n n 


N*  - 
jn 


£ V 


(2.23) 


Njn  is  thus  the  number  of  (X^fi^  with  X^x^  and  e^J .Obviously, 


k=l 


N.  = N,  ,lijsM  . 
jn  Jn 


The  natural  estimate  of  tT.m  is  N.  /n  .There fore, let  6 be  a de- 

) JK  jn  n 

clsion  function  satisfying 


= 0 if  Max  Nk  ,l<j<M,k=l,2, 

nj  n Tc  jn  1<ifM  in 


(2.24) 

Notice  that  (2.24)  does  not  put  any  restrictions  on  6 outside  B.The 

n 

following  theorem  can  be  proved. 


Theorem  2.3.  For  any  decision  rule  satisfying  (2.24)  for  all  n, 

L„  " L*  wpl.In  fact, for  every  *>0, there  exist  constants  K,>0  and 
n 1 


K2>0  depending  upon  "j » • • • ,ttm  , ^ - 

■K2  n 


,UM  and  « such  that 


P{Ln-L*>«}  * Kx  e 


(2.25) 


The  first  part  of  theorem  2.3  can  be  proved  by  means  of  theorem 
2.1  (which  in  turn  uses  lemma  2. 3). We  have  added  an  alternative 
proof  which  has  the  advantage  of  being  more  direct  and  of  provi- 
ding more  information  concerning  the  rate  of  convergence  of  Lr  to 
L*. Using  the  Borel-Cantelll  lemma, the  bound  (2.25)  is  strong  enough 


to  prove  that  L -»  L*  wpl. 
n 
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If  the  j*  * • • • » are  not  atomic  measures , then  one  can  partition 
Fm  Into  a countable  number  of  sets , consider  each  set  as  one  point  in  a 
new  space  and  use  a decision  rule  satisfying  (2.24)  for  all  n in  this  new 
space.  The  discrimination  rule  so  obtained  is  called  a histogram  type 
discrimination  rule. 

2.4  Proofs 

Proof  of  (2.9)  and  (2.10) 

Given  6 , note  that  ae(u) , 
n 

L„.X*1-Et6ne(Vn'X)lVn-Xl 


M M 

"Sp)(»-E=tVVx,t(rtilVx’ 


pJ(X)"6nJ(Vn'X)P{9=j,Vn'X}l 


-SC 

- |j(p,(x>  - 

-^Pj»(i-  w*)- 


,X)  P(0=j|X] 


(2.26) 


m 


Similarly,  for  any  Borel  measurable  mapping  6 from  R to 


[0,1]M  with 


.m 


J=1 


flj  (x)  = 1 , x € R , 


(2.27) 


p{ 0*/  »|X]  « £Pj(X)  (l  - 6 }(X))  ae(p,) . 


E& : ’i  •:  *'• 
j&fcV 
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Let  us  therefore  define 

Ln(x)  = 

Epj«( 
1=1  J ' 

yv*>) 

(2.28) 

M 

/ V 

L (x)  = 

&wi 

(1  - vv*>) 

(2.29) 

L*  (x)  = 

M 

EpjWI 

! 1 - 6r(vx)) 

(2.30) 

where  6*  is  the  Bayes  decision  function  defined  by  (2.7)  and  (2.8).  We 

see  that  L (X)  = L v ae(u),  P{0V/  e|X}  - L(X)  ae(i*)  and  P(  0*  ^ 0 |X]  = 
n n , a a a 

L*  (X)  ae(u)  . 

To  show  (2.10),  note  that  ae(n) , wpl , 

M 

Ln,X=VX)  = gpJ(X>(1- W») 

* £p,«  - Max  p (X) 

J=1  ’ lsk<M 

= L*(X)  = Pf  0*  * 0|X}  . 

Taking  conditional  expectations  yields,  wpl, 

V^n.x'V  2P'9*X’',)‘L*  • 

In  a similar  fashion  we  can  prove  (2.9). 


Q.E.D. 


Proof  of  Theorem  2 . 1 


Let  (Pj , . . . , pM)  be  a given  version  of  (2.6),  and  let 

J(x)  = (l:  U I s M ; p (x)  = Max  p (x) } . (2. 

1 lsjsM  J 

Then  from  (2.28-2.31),  we  have 
Lemma  2 . 1 . 


Ln(x)  - L*  (x)  = 


M 

= £ ( Max  PiW  - P,(x)\  6 (V x) 
R'KKM  1 J / ni  n 


Proof: 


From  (2.28)  and  (2.30)  we  have 

Ln(x)  - L*(x)  - gPj(x)(S*(x)  - 6nJ(Vn,x)) 

= Max  p(x)(l-  V 6 . (V  ,x)j 
UiiM  1 ' j tf(x)  nJ  n / 


E P,(x)  8 ,(V  #x) 

iXJ(x)  J n 


= E ( Max  " Pit*))  >x)  • 
jfiCxl'uUM  1 t / n)  n 

Lemma  2 . 1 now  follows  from  the  definition  of  J(x) . 

Q.E.D. 


Lemma  2.2  . 

If  6nj(vn'x)  "o  in  probability  (wpl)  ae(^)  for  all  J^J(x),  then 
L^fx)  - L*(x)  "o  in  probability  (wpl)  ae(^)  . 
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Proof: 

Lemma  2 . 2 follows  from  lemma  2.1. 


Q.E.D. 

We  will  make  extensive  use  of  the  following  lemma. 

Lemma  2 .3. 

Let  g be  a Borel  measurable  function  of  V and  x taking  values 
n n 

in  [0 , l]  where  x € R and  n = 1 , 2 , . . . . If 

g (V  ,x)  ^ 0 in  probability  (wpl)  ae(u) 
n n 

then 

5 0 in  probability  (wpl). 

Proof: 

For  the  in  probability  part,  we  argue  as  follows.  Let  e > 0 be 
arbitrary.  Then 

Plj(m  9n|Vn'x>“(dx)  1 *! 

■(mEtsn<Vx»“(dX) 

/ „ ».<«*>  - 0 

fx:g  (V  ,x)  a 0 in  probability) 
n n 


by  Markov's  inequality,  Tonelli’s  theorem  and  two  applications  of  the 
Lebesgue  dominated  convergence  theorem.  For  the  wpl  part  of  the 
theorem,  we  can  argue  as  in  the  proof  of  theorem  A of  Glick  (1974). 
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Q.E.D. 

As  a corollary  of  lemma  2.3,  we  have  the  following  lemma . 

Lemma  2 . 4. 

If  Ln(x)  - L*(x)  5 0 in  probability  (wpl)  ae(y,),  then  L^  - L*  ” 0 
in  probability  (wpl) . 

Proof: 

Notice  that 

Ln’L*  = E{Ln(X)  ‘L*(X)IV 

and  that  by  (2.28),  (2.30),  L and  L*  are  Borel  measurable  functions  of 

n 

and  x taking  values  in  [0,1]  ae(n),  where  the  set 

{x:  L (x)  ^ [0, 1]  or  L*(x)^[0,l]} 
n 

M 

= fx:£j  Pj(x)  /l) 

which  is  a ^-null  set  not  depending  upon  V . It  is  easy  to  see  that 
lemma  2.3  remains  valid  for  this  case. 

Q.E.D. 


Lemmas  2 . 2 and  2 . 4 together  imply  theorem  2.1. 

Proof  of  Theorem  2 , 2 

The  following  three  lemmas  are  needed  to  prove  theorem  2.2. 


Lemma  2 . 5 . 

Let  {f  } be  a weakly  (strongly)  consistent  sequence  of  estimates 

of  f,  at  x,  1 < J s M.  Let  n.  , ...,ttw  be  defined  by  (2 . 17) . Then,  for 
J In  Mn 

all  1 s J s M, 

| njnfjN  (x)  " TTjfj(x)  I " 0 in  probability  (wpl)  . 
jn 
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Proof: 

Let  c > 0 be  arbitrary  and  let  J = 1 without  loss  of  generality. 

If  = 0.  then  tt.  =0  wpl  for  all  n and  lemma  2.5  follows.  So  we  do 
1 in 

assume  that  rr^  > 0 . 

Recall  that  for  any  two  sequences  of  random  variables  Y, , . . . ,Y  , . . . 

n n 1 n 

and  Z, Z , ....  if  Y -»  Y in  probability  (wpl)  and  Z -♦  Z in  probability 

inn  n 

(wpl),  where  Y and  Z are  arbitrary  random  variables.  Therefore,  we  only 
need  to  show  that  l^nfx)  - fj(x)  | 0 in  probability  (wpl)  implies  that 

| f^  (x)  - f^(x)  | 5 o in  probability  (wpl),  because  we  know  by  the 
In  n 

strong  law  of  large  numbers  that  | nln  - \ -»  0 wpl. 

Then  it  is  obvious  that 

P{  | f1N  (x)  - fx(x)  | ic) 

In 


JIT!  | IITT] 

* ptNi„  < ~r  ) + pf “ln  2 ~r  ■ i 'in  w • fi(x)  i 1 «) 

In 

(N,  -E(N,  } rr,  » 

‘ Pj ~ ~<tK"1  p{|fu(x)  -IjWI  X .1 


n 

-♦  0 


by  the  weak  law  of  large  numbers  and  by  our  assumption.  For  the  wpl 
part  of  the  theorem,  note  that 

P{  U {|f1N  (x)  -fj(x)|*  «}} 

kin  lk 


ntr  nrr 

* ptNln  < 2 ) + PfNl  * 2;  U{*flN  * «} 

k^n  lk 

4 Pllnl n’^ll  * ~2  } + pt  U t |flk(x)-f1(x)|  i c}} 

kim^/2 
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by  the  weak  law  of  large  numbers  for  and  by  our  assumption. 

With  the  aid  of  lemma  2.5,  the  trivial  lemma  2.6  given  below, 
(2.15)  and  lemma  2.2,  we  are  in  a position  to  deduce  lemma  2.7. 

Lemma  2,6. 

If  { 6^)  Is  a decision  rule  satisfying  (2.18)  and  If  for  all 
1 s J « M, 

^njnfjN  ^ ” njVX^  I " 0 in  Probability  (wpl)  , 

Jn 

then  6 (V  ,x)  " 0 in  probability  (wpl)  for  all  ) with  Tr.f.(x)  < Max  Tr.f.(x). 
nJ  n n liliM  11 

Proof; 

We  prove  the  convergence  in  probability  version  of  lemma  2.6. 

The  wpl  version  is  proved  similarly.  Let  x € Rm  and  let  J(x)  = 

{ J j JN1  c { 1, . . . ,M)  be  as  in  (2.31)  where  (Pj  , . . . ,pm)  is 

defined  by  (2 . 15) . Let 

d = inf  (n  f (x)  - TT.f.  (x)). 

J€J(x)  n 
kc7(x) 


By  definition  of  J(x) , we  know  that  d > 0. 
let  J ^ J (x) . Then 

P(WX)>  •’ 1 PlV)N,  <x) 


Let  c > 0 be  arbitrary  and 


Max  n«nfiw  (x)) 
1<KM  lnlNin 


( M ) 

P U * Kn^N  (x)  " nkfk(x) ' * d/2't  " 0 

lk=l  kn  ’ 


n 


since  Itt.  f...  (x)  - rt,f4(x)  I -*  0 in  probability  for  all  1 < j * M. 

' Jn  JN^n  J J 1 


Q.E.D. 
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Lemma  2.7. 

Let  {fjn3  be  a weakly  (strongly)  consistent  sequence  of  estimates 
of  f^  on  B for  all  1 < j * M and  let  p(B)  = 1.  If  { 6^}  is  defined  by  (2.18) 
for  all  n , then 

Ln(x)  - L*(x)  "o  in  probability  (wpl) , all  x € B 

where  L and  L*  are  defined  by  (2.28),  (2.30)  and  depend  upon  the  version 
n 

(p. , . . . , p. .)  of  (2.6)  that  is  defined  by  (2.15)  and  the  given  densities 
1 M 

fr — fM  * 

Theorem  2.2  follows  from  lemmas  2.7  and  2.4. 

Proof  of  Theorem  2 . 3 

By  theorem  2 . 1 it  suffices  to  show  for  all  x € B and  all  j € 

{ 1 M}  with 

p (x)  < Max  p M 
J 1 sis M 

that  6 .(V  ,x)  ” 0 wpl . 
nj  n 

Let  x^  € B and  let  J ^ Kx^) . We  know  that 


n.m  = Max  tt  m - d 
1 J UUM 


for  some  d > 0.  Next,  let  « > 0.  It  is  clear  that 
I M N*  > 


< 2Me 


-2n(d/2)i 


by  Hoeffdlng’s  Inequality  (Hoeffding,  1963).  Clearly, 


I 
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m 


to 

Xj  p{  6nj^Vn'V  * < • SO  that  3 o wpl  by  the  Borel- 


Cantelil  lemma. 


Q.E.D. 


Alternative  Proof  of  Theorem  2 . 3 

Let  « > 0 be  arbitrary.  Define  for  any  {lj  , . . . ,Lj } c { 1 , . . . ,M  } 
with  1 i lj  < ...  l<N*M,b>0  and  a > 0 

B1  l*  = f xk:  pi  fcO  = • • • = Pt  OO  > sup  P.(xJ 

l N ll  * Si  * i([i\ iN]J^ 

; \ « “1 

Cib  1 “ fxk:pl  (V  “ •••  = Pi  (xJ  > 8UP  P^xJ+b 
1 N ^ *!  k *N  k ^fil ^3 


and 


Because 


1 

^€6) 

= (vmk 

a a ; € B] 

M 

U 

U 

N=1  [lv 

• • • » *n  ^ c t * 1 

,...  ,M) 

M 

U 

U 

N-l  fir 

• • • / } c { 1 , 

. . . , M } 

_ D » 

Ci  n d . 

*1  # • • • I 


i i 


we  note  that 


M 

BDFC=  U 


-1  i n(Ci  i 

N=1  {ij 1NlcCl M]  *N  *N 


* c a 

) UD° 


where  (•)  denotes  the  complement  of  a set.  It  Is  possible  to  choose 


t 
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a and  b small  enough  so  that 

P{X  € Fc}  < «/2  . 

Therefore,  we  have,  ae(p)  and  wpl , 

L„M  - L*«  * ‘{XSPC}  + - L*<») 

and  thus,  wpl, 

Ln  - L*  * ',(X€FC)  ♦ E{I(X6F)(Ln(X)  - 1*»)  | V 

and 

p!Ln-L.  * .)  *p|E(I{x€n(LnM-L.(X))  |VnJ  > ./2j 

^E'ItX€F)(Ln<»-L*M)) 

4 7 2 2 su^  E{L  (x)-L*(x)} 

N-l  {i  ,77.  ,i  } x€Dancf  . 

1 IM  * ^ $ • • • 9 

C { 1 M} 

^ t E -up,, 

N-l  {i, iN}  x€DanC° 

X W j $ • • • $ 

c{l,...,M)  £ E J(p  (x)-p  (x))6  (V  ,x)j 

i Ni  j nJ  n 1 

. m _ 

*-  V Y.  N sun  P{6m.(VM,x)>0) 

= fl M)  iftij V 

J-  £ E 2NM  e-2n(ab/2)2 

‘ N-l,  {lj 1N) 

.1  £ (“)  2NM  .-“2b2/2 
• N-l,  ' ' 


I 
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1 0M  .2  -na2b2/2 

* — 2 M e 

« 

where  we  used  Chebyshev's  inequaii  ty,  lemma  2.1, the  inequality 
used  in  the  first  proof  of  theorem  2.3  and  the  fact  that  if  be- 
longs to  nDa»then  „ m .in  m +ab  for  all  J^{i 

* ^ 9 • • • 9 1 

i } .This  proves  (2.25)  .Theorem  2.3  follows  by  the  Borel-Cantelli 
N 

lemma  and  the  arbitrariness  of  c . 

Q.E.D. 


I 
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Chapter  3 EMPIRICAL  MEASURES 


3 . 1 Introduction 

Let  0 be  the  Borel  sets  of  Rm  and  let  X, , . . . ,X  be  In- 

in 

dependent  identically  distributed  random  vectors  with  values  in  Rm 
and  with  a common  probability  measure  ^ .The  empirical  measure  j* 
for  X.  ,...,X  is  defined  by 


un(A)  = 1 £ Xi } )/n  ,A€B  * (3,1) 


In  this  section  we  study  the  closeness  of  u (A)  to  U(A)  and, in 

n 

particular, we  obtain  explicit  upper  bounds  for 

sup  P{  | |jl  (A)-n(A)  | a*  } (3.2) 

A€G 
and 

P{  sup  |u  (A)-p(A)|s«)  (3.3) 

A€G 


where  c>0  and  where  Q Is  a given  subclass  of  Borel  sets. 

By  Hoeffding's  inequality  (Hoeffding,  1963)  we  already  know 

that  , 2 

sup  P{  | u(A)-n(A)  | a « } < 2 e~^n<  . (3.4) 

A€8 

Further, if  Gc0  is  such  that 

sup  tt(A)  = g < 1/2 
A€G 

then, by  Bennett's  Inequality  (Bennett,  1962) , we  conclude  that  for  all 
A€G 

P{|nn(A)-MA)|i.)‘2  l”  - 1 > 

From  In  (l+a/b)i2a/(2bfa)  for  all  a>0,b>0, 

25 
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2 

sup  P{|u„(A)-u(A)|ic}<2e"n'  /(2g+*).  (3.5) 

A€G  n 

Let  us  turn  now  to  the  study  of  upper  bounds  for  (3.3), 

postponing  for  the  time  being  the  question  of  the  measurability  of 

sup  | (A)  — p,(A)  | .We  remark  that  for  absolutely  continuous 

A€G  n 
measures  n> 

sup  |n  (A)-U(A)  | - 1 wpl  (3.6) 

A€B 

so  that  it  makes  sense  to  restrict  ourselves  to  proper  subsets  G . 
The  best  known  result  is  perhaps  the  Glivenko-Cantelli  lemma  (see 
Loeve,1963)  which  states  that 

sup  | y,n(A)-n(A)  | "0  wpl  (3.7) 

A€G  n 

where  m=l  and  g=((-«,x]  |x£F}  .This  result  was  later  generalized 

to  Fm  by  Wolfowitz  (1960). If  X * is  the  class  of  all  sets  from  Fm 

that  are  obtained  by  intersecting  i closed  linear  halfspaces  (i.e. 

sets  of  the  form  a,x_+...+a  x 2a  , where  a,,x,  cR  ,l*i<m 

11  mm  m+1  i i * 

and  a , € F ) then  Rao  (1962)  shows  that 
m+ 1 

sup  | u (A)~n(A)  | * 0 wpl  . (3.8) 

A€Kf 

On  the  other  hand , there  are  n for  which 

suP-  | M-n(A)-u(A)  | =1  wpl.  (3.9) 

A«  U X* 

Another  interesting  class  of  Borel  sets  from  Fm  is  the  class  c of 

m ^ 

Borel  measurable  convex  sets  from  F .Rao  (1962)  shows  that  there 
are  n for  which 

sup  | u (A)-n(A)  | = 1 wpl 

a€gc 
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but  that  if  every  A in  Gc  has  a u-null  boundary  (which  is  always 
the  case  if  ^ is  absolutely  continuous  with  respect  to  Lebesgue 
measure) , then 

sup  ( n (A)-ii(A)  | 5 0 wpl 
A€Gc 

if  and  only  if  ^(A)  " n(A)  wpl  for  every  A with  ^(boundary  A)=0. 

Our  main  objective  is  not  so  much  to  obtain  new  results 
of  the  type  (3. 7)  ,(3. 8)  but  to  find  explicit  upper  bounds  for  (3.3) 
for  some  interesting  subclasses  G. However, we  will  display  upper 
bounds  that  are  strong  enough  to  imply  (3.7)  and  (3.8)  by  the 
Borel-Cantelll  lemma. Most  Inequalities  encountered  in  the  literature 
regarding  (3.3)  deal  with  the  class  of  sets  (-•  ,x]  ,xgFm.The  in- 
terest in  this  class  Gn  is  that  if  F and  F are  the  distribution 

o n 

functions  corresponding  to  u and  un,then 

D = sup  |F  (x)-F(x)|  = sup  | u (A)-n(A)  | . (3.10) 
x€Fm  n A€Gq  n 


For  the  case  m=l  .Dvoretzky,  Kiefer  and  Wolfowitz  (1956)  showed 

const 
-2n«  ‘ 


that  there  exists  a universal  constant  C>0  such  that  for  all  «>0 

2 

P{D  it)iCe'tn‘.  (3.11) 

n 


For  m>l, Kiefer  and  Wolfowitz  (1958)  later  showed  that  there  exist 


constants  C.>0,C9>0,both  depending  upon  m,such  that 
1 2 2 


P{Dn*t )<C1e'C2n# 


(3.12) 


for  all  c>0. Kiefer  (1961)  improved  this  result  by  showing  that  for 

each  b£(0,2)  there  exists  a constant  C.  >0  such  that  for  all  «>0 

D,m 

‘‘(2"b)n«  . (3.13) 


P(Dnl,1<Cb,me 
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However  in  none  of  the  cited  papers  are  explicit  expressions 
obtained  for  C.C^C^  and  Cfa  Among  other  things  we  will  find 
an  explicit  expression  for  a constant  C (depending  upon  m)  such 
that  for  all  «>0 

2 

To  derive  (3.14)  and  other  bounds  for  (3.3)  we  will 

employ  techniques  that  were  first  suggested  by  Vapnik  and  Chervo- 

nenkis  (1971). Their  original  result  is  the  following. If  x^F1”,  lstistn, 

and  if  N (x,  , ...,x  ) denotes  the  total  number  of  different  sets  in 
Cl  n 

{ (Xj , . . . , XR)  n A | Ag  G } , then , 

P {sup  | y,  (A)-^i(A)|2:,)s^4s(G,2n)e'n,  /8  (3.15) 

A6G 

where 

s(G,n)=  Max  N (x  ,...,x).  (3.16) 

(Xj xn)  a 1 n 

We  remark  that  (3.15)  and  related  inequalities  are  only  useful  if 
suitable  upper  bounds  can  be  found  for  s(G,2n)  for  the  class  G 
under  consideration. The  second  section  of  this  chapter  deals  with 
some  techniques  to  find  such  bounds. In  the  third  section, the  main 
results  are  presented. 

3.2  Upper  Bounds  For  s(G.n) 

Let  G be  a class  of  sets  from  Rm  (msl)  and  let  s(G,n)  be 
the  maximal  number  of  different  sets  in  { (x^  , . . . .x^lnA  | Ag  g ] when 
the  maximum  is  taken  over  all  (x1 , . . . ,xn)gF  .It  is  clear  that  for 
any  G > s(c,n)<2n.We  state  three  lemmas. Lemma  3.1  is  proved  by 
Vapnik  and  Chervonenkis  (1971). The  proof  of  lemma  3.2  is  trivial. 
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Lemma  3.1.  s(G  ,n)  is  either  exactly  2n  or  else  upper  bounded  by 
n +1  where  b > 0 is  the  first  integer  for  which  s(G  ,b)  < 2b. 

Lemma  3.2.  If  G ^ and  g2  are  two  classes  of  sets  from  IRm  then 
s(G1  G2>n)  < sfG^n)  s(G2,n)  where  G lA^  d2  }• 

Lemma  3.3.  If  G is  the  class  of  all  left  half-infinite  intervals  from  1R, 

then  s(G,n)  < 1+n.  If  G is  the  class  of  all  intervals  from  1R,  then 
2 

s(G  ,n)  < 1+n  . 

The  proof  of  the  first  part  of  lemma  3.3  is  trivial.  For  the  second  part, 

2 

notice  that  s(G  ,n)  s l+n+(n-l)+. . .+1  = l+n(n+l)/2  < 1+n  . 

From  lemmas  3.2  we  can  conclude  the  following. 

Lemma  3.4.  (i)  If  G is  the  class  of  all  m-fold  products  of  left  half- 
infinite  intervals  from  1R,  then 

s(G,n)  < (l+n)m.  (3.17) 

(ii)  If  G is  the  class  of  all  rectangles  in  lRm  (i.e. , all  m-fold  products 
of  Intervals  from  1R) , then 

s(G  ,n)  < (l+n2)m.  (3.18) 

The  following  lemma  is  also  trivial . 


Lemma  3.5.  If  Gj  c G2  then  s(Gj,n)  < s(G2<n). 

Let  CK  be  the  class  of  all  closed  and  all  open  linear  halfspaces 

in  Rm  where  an  open  halfspace  is  the  set  of  points  x=(x. , . . . ,x  ) in 

i n 

Rm  satisfying  the  inequality 


Vi 


> 


for  some  sequence  a„,a, , . . . ,a  of  real  numbers.  A closed  linear 

0 1 n 

halfspace  is  just  the  complement  of  an  open  linear  halfspace  or, 
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equivalently,  a set  of  points  satisfying  the  inequality 


n 


for  some  sequence  aQ<a1<  • ••  ,an  °f  real  numbers. 

Let  «S£,k  >0,  be  the  class  of  all  open  and  all  closed  spheres  in 

Fm  where  the  norm  used  is 


(g|y1|k)1/k  ,k<- 

Max  | y | , k - * 

' lsism 

and  y = (y. , . . . , y ) € Fm . We  now  have 


(3.19) 


Lemma  3.6.  s(jy,n)  s l+nm+1 

s(«^,n)  * l+n2+m^k  ^ , k even  integer 

b(3>  ,n)  s (l+n2)m 
00 

s(^,n)  < (l+n2)m  . 


Proof;  The  first  part  of  lemma  3.6  is  shown  by  Vapnik  and  Chervonenkis 

(1971,  p.  266).  For  the  third  part,  note  that  3”  is  contained  in  the  class 

00 

of  all  rectangles  from  F so  that  s(^  ,n)  « (1-t-n  ) by  lemmas  3.4  and 

00 

3.5.  For  the  fourth  part,  we  remark  that  3^  equals  3^  after  a rotation 

of  the  axes  in  Fm.  If  k i 2 , k even,  then  it  is  clear  that  3^  cjv*  where 

JK*  is  the  class  of  all  linear  halfspaces  in  F1+m^t  To  see  this, 

note  that  for  y € Fm  and  r > 0,  a closed  sphere  S(y,r)  with  center  y 

and  radius  r is  a linear  halfspace  if  one  considers  x, , . . . ,x  , 

l m 
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2 2 k-l  k-1  k 

,x  and  > x,  as  the  new  variables. 

lml  m l 


Therefore,  by  lemma  3.5  and  the  first  part  of  lemma  3.6,  s(.S^,n)  s 


sCx*  ,n)  s 1+n 


2+m(k-l) 


Q.E.D. 


3.3  Main  Results 

Theorem  3.1.  Let  ^ and  be  the  empirical  measures  of  y,  with  two 
independent  samples,  one  of  size  n and  one  of  size  n'.  Let  e > o and 

a € (0,1)  be  such  that  l-2e  “ n € >0.  Let  g be  a class  of  Borel 

set  from  Rm  such  that  sup  | u (A)  - p,(A)  | and  sup  | ^ ' (A)  - u"  ,(A)  | are 

A€G  n A€G  n n 

random  variables  for  all  n,n’.  Then, 

P { sup  |u  (A)  - ^(A)  | 2 «} 

A€G 

2s(G,n+n‘)  -2n*Z(l-2a-2n/(n+n‘))  . . 

s 2 2 6 iJ.zu; 

l-Ze’201 

The  proof  of  theorem  3 . 1 follows  the  argument  in  Vapnlk  and 
Chervonenkis  (1971).  We  remark  that  a and  n'  are  arbitrary,  a freedom 
which  can  be  used  to  obtain  an  estimate  for  ^ e}. 

Theorem  3.2.  There  exists  a constant  C_  such  that  for  all  n and  e > 0 
m 

PfD  a «}  sC  n2me'2n#  . (3.21) 

1 n m 

* 3 m 

In  particular,  (3.21)  holds  with  C = 4e  5 . 

m 

Notice  that  with  «=l/2,n'=n,  (3.20)  implies  (3. 15) , which  is 
the  bound  of  Vapnlk  and  Chervonenkis  (1971). From  (3.  IS)  and  (3.17), 
we  deduce 
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P { D 2 e } s 4(l+2n)me'nt  /8  (3.22) 

n 

which  is  valid  for  all  e > 0 and  nsl,  From  lemma  3.6  and  (3.15)  it 
is  also  possible  to  derive  the  following  inequalities. 

P{  sup  L (A)-n(A)|  s e}  (3.23) 

A€  .« 
k 

s 4(l+2n)^+m^C  ne  (k  ^ 2 , k even) 

2 

P{  sup  |p  (A)  -p(A)  | * e)  * 4(l+2n)2me~ne  /8  . (3.24) 

A6^  “ 

We  will  next  present  a uniform  counterpart  to  Bennett's  inequality 
(3.5).  Let  and  ^ be  empirical  measures  of  p with  two  Independent 
samples,  both  of  size  n.  Then  the  following  is  true. 

Theorem  3.3.  Let  e > 0 and  let  G be  a class  of  Borel  sets  from  Fm 

which  is  such  that  sup  p,  (A),  sup  |p,  (A)  - ^(A)  | and  sup  ip'  (A)  ~ (A)  | 

A€G  n A<=G  n AfG  n “ 

are  random  variables  for  all  n.  If 

sup  p(A)  *g  s 1/2  (3.25) 

A€G 

then 

P{  sup  l u (A)  -p(A)  | * e) 

A€G 

2 

* 4s(G,2n)e'ne  /(64g+4e)  + 2P{sup  ^(A)  >2g)  (3.26) 

A€G 

, 2 

for  all  n with  n a 8g/ < . 

In  nonparametrlc  density  estimation  an  important  class  of  Borel 
sets  is  the  class  of  all  the  sets  with  bounded  radius  under  the  norm 


I 


33 

||  • ||  . The  following  theorem  will  prove  very  useful  in  upper  bounding 

the  term  P{sup  n9  (A)  >2g}  in  (3.26). 

A€G 

Theorem  3.4.  Let  q be  a class  of  Borel  sets  such  that 

sup  sup  ||y-x||  i r <»  (3.27) 

A€G  y€  A,xf  A 

where  ||  • ||  is  any  norm  on  lRm.  If 

sup  ii(S(x,2r))  s g s 1/2  , 
x€Rm 

where  S(x,2r)  = {y:  y€  R™;  ||y-x||  * 2r]  , then 

P { sup  n (A)  > 2g } s2ne"n9/1°  (3.28) 

A€  G 

for  all  n with  n s 1/g. 

As  an  example,  let  G*  be  the  class  of  all  rectangles  from  lRm 
with  the  property  (3.27).  Observe  that 

sup  u(S(x,2r))  s sup  ^(A) 

X A€C|r 

and  that  the  measurability  condition  of  theorem  3.3  is  satisfied  for 
G*  for  all  r.  Therefore,  combining  lemmas  3.5,  3.6  and  theorems 
3 . 3 and  3.5,  yields , for  all  c > 0 , 

P{sup  |u  (A)  - >i(A)  | s «} 

a?g 

* 4<lt2n)2me'n,2/(6',!M,)  + 4ne‘ns/1°  (3.29) 

2 

for  all  n * 1/g,  n a 8g/«  , provided  that 
sup  i*(A)  <g  < 1/2  . 


f 
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Let  us  make  another  interesting  observation.  When  u puts  all 
its  mass  on  a countable  set  of  points , say  {x  ,x. , . . . } , then  it  is 

X fa 

clear  that  Hn(fxj3)  = Nj/n  where  is  the  number  of  X^'s  such  that 
^i=Xj  * ^rom  (3.11)  we  can  conclude  that  there  exists  a constant  C > 0 
such  that  for  all  e > 0 

p[  (iNj/n  - ^({Xj})  | * 6)j  sCe‘2ne2  . (3.30) 

The  space  from  which  the  x^  are  taken  is  irrelevant.  We  remark  that 
(3.30)  can  be  used  to  improve  some  of  the  bounds  that  were  obtained 
in  the  proof  of  theorem  2.8.  We  should  mention  that  (3.6)  is  not  valid 
for  such  atomic  measures  In  fact,  it  is  true  that 

sup  | a (A)  - ix(A)  | h 0 wpl  (3.31) 

A€B 

if  y,  is  atomic.  Unfortunately,  the  rate  of  convergence  to  0 is  not 
uniform  over  all  such  u because  for  all  0 < e s 1/4  and  all  n, 

sup  P{  sup  |u  (A)  - p,(A)  | * e}  = 1 . (3.32) 

all  atomic  A€B 
measures  14 

Both  properties  (3.31)  and  (3.32)  are  proved  in  the  next  section. 

з .  4 Proof 8 

Proof  of  theorem  3 . 1 

Let  S;=  ,Xn)  and  S^,  = (Xr+1  , . . . ,Xn+n,)  where 

Xj , . . . .Xn+n,  are  lid  random  vectors  from  Rm  with  probability  measure 

и.  Denote  the  total  (m-n')-size  sample  by  S . and  let  u'  , u,",  denote 

n+n'  *n  *n 

the  empiric  probability  measures  for  S'  and  S".  respectively.  For  each 

n n 

Borel  subset  A of  Rm,  let 


t 


I 


I 
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(n,n’) 

PA 


I un(A)  " ^n,(A>  I 


and  let 


(n,n') 

P 


sup 

A€G 


(n,n’) 

PA 


Define 


TTn  = sup  |^(A)-n(A)| 


Ae  G 

and  note  that  p^n,n  ^ and  t/1^  are  random  variables  by  supposition.  Let 

n t M 1 n - r (n'n‘)  ^ i 

C = {.tt  >sj,C={p  s (l-0()e  j 

TT  P 


where  0 < a < 1 and  e > 0.  Let  P,  P'  and  P"  be  the  probability  measures 
induced  by  S 

n+i 

first  show  that 


induced  by  S ,,  S'  and  S",  in  K(n+n,)m  ]Rnm  and  jRn‘m . we  will 
n+n'  n n' 


2 2 

P{C  } s (l-2e-2®  n‘e  ) P{C  } . 

P TT 

Indeed, If  I =l(i.e. , sup  |^'  (A)  - ^(A)  | > «)  , then  there  exists  an 
tt  A€  G n 

Aq€G  (Aq  depends  upon  of  course)  such  that  |u^(AQ)  - k*(AQ)  | > s. 

Thus , on  {S'  € C } , 
n tt 

{|uJ,(A0)  - u(Aq)|  *««}  = {PAo'n,)  * (!-«)•} 

c{p(n'n,)  * (1-Qf)c]  = C . 

P 


Therefore, 


p{C}=  f 


‘c  dp 


(n+n')m  p 


w 


= f dP'  f Ic  dP" 

Rnm  Rn'm  P 

> f dP'  f I dP" 

•6  gn'm  Cp 

TT  1R 

2 -“'vi 


iP1[Cn)  inf  P"{|m£.<A0)  - m.(A0)  | sere} 


A0€G 

2=  (1  - 2e“2“  n‘e  )P'{Cn} 


by  Hoeffding's  Inequality  (3.4). 

Consider  PfC  }.  Let  T.S  . denote  a permutation  of  the 
1 p i n+n' 

X1 " “ ,Xn+n'  and  let  p^n  n and  PA^n,n  \f)  be  defined  as  p^n,n  ^ 

and  p /n,n  ^ if  T.S  , replaces  S . in  the  definitions.  Of  course, 
i n+n'  n+n' 

for  all  integrable  functions  gfS^^,),  it  is  true  that 

f g(S  )dP  = f g(T  S )dP  . 

,(n+n')m  •'fn+n'lm 


(n+n')m 


We  say  that  two  sets  A,  and  A„  from  1R  are  S -equivalent 
1 12  n+n' 

if  both  sets  define  the  same  intersection  with  {X1 , . . . »xn+n.  }*  that  i8» 

c c 

no  X takes  values  in  the  difference  set  A A U A A_,  1 s j s n+n! . 

12  12 

S -equivalent  sets  have  the  following  nice  property.  If  T.S  , 
n+n'  1 n+n’ 

is  used  in  the  definitions  of  ^ and  pT,,  then  u^tA^  = and 

^.(Aj)  * ^,(A2)  for  al1  P°88lble  (n+n')!  permutations  T^ . 

Given  S €R^IM’n  let  G'  c G be  a class  of  sets  from  G 
n+n 

with  the  property  that  every  set  A€G,  A/G'  is  S .-equivalent  to 

iH*n 

some  set  B€G'  and  that  no  two  sets  from  G'  are  S -equivalent  to 

n+n 


37 


each  other.  Obviously,  regardless  of  Sn  ,,  G'  has  never  more  than 
s(G.n+n')  component  sets.  To  make  the  dependence  on  Sn+n,  explicit, 
we  will  write  G'(S  n,). 

Proceeding  as  in  Vapnik  and  Chervonenkis  (1971), 


P{C  } = P{p(n,n,)  s (1-cr)  c} 
P 


(n+n1)! 


f 1 

^(n+n'Jm  (nfn‘)!  i=i 


! , dp 

(p(n,n 


. f 'T'  sup  I 

^(n+n')m  (nfn  * ! i=l  A€  G 
JK 


{pA(n,n,)(i)i(i-a)«} 


dP 


- x (n+j^)! 

(n+n‘)!  1=1  A€G’(Sn+n,)  {pA(n,n,)(l)s(l-o)e3 


SI 

Let  Y 


/ E fife 

^n+n'\m  Acfl'fS  > in+n  ) . 


n+n'  1 VA 

(n+n')! 


i 


(n+n')m  A€G,(Sn+n,)  (n+n')!  fel  {pA(n'n,)(i)s(l-a) «} 


dP. 


Y Y 

I"**'  n'  n+1 


,Y  .be  random  variables  whose  values 
n+n’ 


are  obtained  by  picking  at  random,  but  without  replacement,  from 

Ifv  Irv  CA,  (thus,  all  the  Y take  values  in  {0.1}).  Then, 

(XieA>  <Vn'eA)  1 


(n+n') ! 


— i — y ' I 

(n+n‘)!  &1  {pA(n'n,)(i)>(l-«)«} 

n n+n' 


Pt*2iv‘  - 


•'((wn,)Vn'(A| 
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= p{|^L  Yi  ■ Vn,(A)l  * (1-Qf)  en'/(n+n') } 

s 2e~2n^1”a,)<n,/(n+n  ))2 
s 2e-2n«2(l-2er-2n/(n+n')) 

by  Hoeffding's  inequality  for  sampling  without  replacement  from  a set  of 
binary  valued  elements  with  mean  Un+n.(A)  (see  lemmas  4.1  and  4.2). 

Notice  that  the  last  inequality  holds  true  for  all  sets  A c IRm.  So,  com- 
bining the  last  two  chains  of  inequalities  yields 


P{C  } s 
P 


dpUe-2"*  (l~2cr-2n/(n+n')) 


)m  A€C'(S  ,) 
n+n' 


^ _/r  -2n‘  (l-2« -2n/(n+n*)) 

£ s(G  ,n+n  )2e 

which,  together  with 

2,2 

P’{C  } s(P{C  })/(l-2e“  * n'*  ) 

n p 


concludes  the  proof  of  theorem  3.1. 

Q.E.D. 

Proof  of  theorem  3 . 2 

In  (3.20),  let  G be  the  class  of  m-fold  products  of  left-infinite 

2 2 2 

intervals  from  F and  let  or  = 1/ne  ,n'  = 3n  e , and  let  n be  so  large 

2 

that  or  < 1 , that  is,  1 < n«  . Recall  that 

s(G,n+n')  * (l+n+n’)m  < (l+nnt2+3n2 c2)m  = (l+4n2e2)m. 

Then,  since 

2ar2n'e2  * 3/2  > ln4 
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and 

4a  2n  «2  = 2 , 4n2  e2/(n4-n')  > 4n2  e2/(n2  *2+3n2  c2)  = 1 , 
we  have  that 

P{Dn  * *1  * ~U+4n-ln2r  e"2n'2  e40n<2  e4n2,2/(n+n,) 
l-2e 

2 

^ .,Ujl  2 2. m -2n«  3 

s 4(l+4n  c ) e e 

2 2 

which  is  valid  for  ns  >1.  If  n*  <1  however,  the  bound  is  smallest 
for  e = 0 or  * = l//n*.  In  both  cases,  the  bound  is  greater  than  1,  so 
that  we  can  say  that  it  is  applicable  for  all  n and  all  « > 0.  Since  we 
can  assume  that  c < 1 , we  obtain 

2 2 
r>fT>  ^ ■>  . 3.,  . 2.m  -2n«  . 3/P  2.m  -2n« 

P{D  a «}  s 4e  (l+4n  ) e s 4e  (5n  ) e 

n 

To  see  that  the  measurability  condition  of  theorem  3 . 1 is 
fulfilled , note  that  for  all  e > 0 

{sup  |u  (A)  - u(A)  ] > c}={  sup  |F  (x)-F(x)|  >«}  = {sup  |F  (x)  - F(x)  | > c} 
A€G  x€Rm  x€  D 

where  F is  the  empirical  distribution  function  that  corresponds  to  u 
n n 

and  D is  a countable  dense  subset  of  Rm.  If  F^  and  F^,  correspond  to 

l*'  and  respectively,  then  we  also  have  that 

{sup  |u’ (A)  - it"  (A)  | > «}  = {sup  |F'  (x) -F"  (x)  | > e}  . 

A€G  x€D 

Q.E.D. 

Proof  of  theorem  3 . 3 

The  proof  makes  repeated  use  of  Bennett's  Inequality  (3.5).  It 
was  pointed  out  by  Hoeffding  (1963)  that  (3.5)  also  holds  if 
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l n 

» (A)  = - V v 
n n ^ i 

where  Yj , . . . ,Y  are  random  variables  whose  values  are  obtained  by 
sampling  without  replacement  from  a population  y^  , . . . ,y  k ^ n, 
with  yA  e {0 , 1 ) , 1 s 1 s.  k , and 

n(A)  - ^ Yt  • 

1=1  1 

The  proof  of  theorem  3. 1 is  followed  with  a = 1/2,  n'  = n. 
Using  the  same  notation,  we  first  note  that 

P[C  } s inf  P{|u£(A)-|*(A)|  s e/21  P{C  } 
p A€  G " 

4-sup } 

L A€G'  n(«/2)Z  (J 

s Cl  - (4g/ne2)]  P[C  } 

TT 

sP{C  3/2 

TT 

o 

by  using  Chebyshev's  Inequality,  the  fact  that  n :»  8g/e  and 

sup  ^(A)  i,  g i,  1/2 . 

A€G 

As  in  theorem  3.1,  we  will  upper  bound  P {C  } . For  every 

P 

event  E, 

P{Cp)  <P{p(n,n)  * «/2  , E } + P{EC} 

Q 

where  E denotes  the  complement  of  E.  Let  E = {sup  ^ (A)  s 2g} 

A€G  Zn 

which  is  an  event  by  hypothesis . Thus , 
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PiC  } s f £ IpProb{|V  Y/n-^,  (A)  |*«/4}  dp 
p ^2nm  A€Q,(S2n)  h Pi/  Zn 

«/■  V 2e-n(-A)2/(2»2„(AH«/4)I 

^2„m4S'(S2n) 

<2  s(G,2n)  e'n,2/(64g+4,)  + P{sup  n2n(A)>2g} 

A€G 

since, on  E,  (A)<2g.Thus,for  all  mdg/t* , 

2 

PtcJ  *4s(G,2n)e~ne  /<643+4«)  + 2P{sup  k*2n(A)>2g). 


A€G 

Q.E.D. 


Proof  of  theorem  3 . 4 


Let  G satisfy  (3.27)  with  a given  0*r<®.Let  n*  , be  the 

2n-l 


X where  X, , . . . ,X. 
2n  1 2n 


empirical  measure  for  X , ...,X  ,X 

1 14- 1 

are  iid  random  vectors  from  R with  probability  measure  y,  .Note 

that  2n 

P { sup  m2n(A)>2g}  SP{U  (S(X  , 2r))>2g } } 

a€g  2n  i=l 

= P{  U (2nu,  (S(X.,2r))>2g  2n}  3 

i=l 

2n  , 

<P(  U {(2n-l)u2n_1(S(X1,2r))>4gn-l)  } 


.£ 

i=i 


i=i 


P{ii2n-i(S(xi'2r))>(4gn"1)/(2n_1)^ 


* 2n  P(U2n-1(S(X1,2r))>3g/2} 

< 2n  sup  P(u2n_1(S(x,2r))>3g/2} 

_m 

X€«  , 

< 2n  sup  P{u2n_1  (S(x,2r))-i*(S(x,2r))>g/2} 


X£R 


m 


i 2n  e 


< 2n  e 
t 2ne 


-(2n-l)(g/2)  /(2g+g/2) 
>(2n-l)g/10 


-ng/10 
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for  all  n with  ngsl.To  derive  this  chain  of  inequalities  we  used 
the  one-sided  version  of  Bennett's  inequality  (Bennett,  1962) . 

Q.E.D. 

Proof  of  (3.31)  and  (3.32) 

Let  ^ put  all  its  mass  on  fx^.x^,...}  and  let  q^^  ( {x^} ) , 
k=l,2,...  .Let 

Un=sup  | ji-n(A)-M,  (A)  J 
AfB 

where  u is  the  empirical  measure  for  X,  .To  show  (3.31), we 

n in 

use  the  Borel-Cantelli  lemma  and  the  fact  that  for  every  «>0, 

OP 

p{un*«}<«  . 

Indeed, for  a given  e>0,pick  Ni3  so  large  that 

JL  * - */3 

and  note  that 


k=l 


Therefore, 


N 

„*•)  * P{£  UJKD-O  * ,}  + PU  ( U J)*2«/31 

k«l  n k=N+-l 

+ Pt“n(A, 

2 k— N+l  k-N+1  _ 

, 2N  * e-2"<*/3)2  . (2N*!)  e'2"*2^ 

by  Hoeffding's  inequality  (Hoeffdlng,  1963). 

To  show  (3. 32), fix  0<«*^  and  nil. Let  q^=l/2n  if  lstk*2n 
and  0 otherwise. Then,  p{uni  « }iP{n/2ni2c  }*1. 


Q.E.D. 


Chapter  4 NONPARAMETRIC  DENSITY  ESTIMATION 
4 . 1 Introduction 

Let  X Xn  be  a sequence  of  independent, identically  dis- 

tributed random  vectors  taking  values  in  F™. Assume  that  the  common 
probability  measure  p of  the  sequence  is  absolutely  continuous  with 

respect  to  Lebesgue  measure  with  a probability  density  f.If  p is  the 

n 

empirical  measure  on  » for  X,  we  can  easily  see  that 

1 n 

sup  |Un(A)-p(A)  | = 1 
A€B 

since  p is  atomic  with  mass  1/n  at  the  points  X,,...,X  .Suppose 
n In 

now  we  look  for  an  estimate  p of  u for  which 

sup  L (A)-p(A)  | " 0 wpl . (4.1) 

A€B 


Assuming  that  p has  a probability  density  f we  see  that  (4.1)  will 
n n 

follow  whenever 


i 


m 1 n 


|f  (x)-f(x)  | dx  -»  0 wpl . 


Indeed , 

sup  |p  (A)-p(A)  | = sup  | f f (x)  dx  - f f(x)dx  | 
A€B  n AfB  JA  n JA 


< sup  /|fn(x)-f(x)  |dx  s f |f^(x)-f(x)  | dx  . (4.2) 

A€B  a Fm  11 

This  shows  the  importance  of  j*  |fn(x)-f(x)  | dx  for  the  study  of  the 
uniform  convergence  properties  of  the  corresponding  measure  y . 

Of  course, from  the  discrimination  viewpoint,  we  are  also  in- 
terested in  estimates  f^  for  which  |fn(x)-f(x)  | 0 wpl  ae(y)  (see 

theorem  2. 3). For  this  purpose  it  suffices  to  find  an  f for  which 

n n 

If  (x)-f(x)  I -»  0 wpl  almost  everywhere  in  x.If  f is  almost  every- 
1 n 1 

where  continuous , it  suffices  to  establish  convergence  on  the  con- 
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tinuity  set  of  f.If  f is  uniformly  continuous , one  is  tempted  to  be- 
lieve that  sup  |f  (x)-f(x)  | -♦  O.wpl  if  |f  (x)-f(x)  0 wpl  for  all  x. 

x 

This  is  not  quite  true  but  the  conditions  under  which  uniform  con- 
vergence takes  place  are  only  mildly  stronger  than  the  conditions 
that  are  needed  to  insure  the  pointwise  convergence  for  the  non- 
parametric  density  estimates  that  will  be  discussed  in  this  chapter. 
We  will  investigate  the  asymptotic  behavior  of  two  popular  estimates. 


(i)  The  Parzen-Rosenblatt  estimate  (or  kernel  estimate) (Par- 

zen,  1962;Rosenblatt,  1957) 
n 


fnM  = n 


-1' 


hn  mK((xi-x)/hn)  »x€ 


,m 


where  (h  1 is  a sequence  from  (0,®)  and  K is  a probability 
n m 

density  function  on  1R  . 

(ii)  The  Loftsgaarden-Quesenberry  estimate  (or  histogram  esti- 
mate, LQ  estimate) (Fix  and  Hodges , 1951;Loftsgaarden  and 
Quesenberry,  1965) 

fn(x)  = C (kn/n)/||>£  -x||m  ,xeFm, 

n 

where  {k^)  is  a sequence  of  integers  with  lsk^sn,C  is  a 

positive  constant  depending  only  on  m and  the  norm  ||.||  and 

X , is  the  k -th  nearest  neighbor  to  x among  X,  ,...,X  . 

k n in 

n 

It  is  clear  that  f itself  is  a density  if  the  kernel  method  is  used, 
n 

With  the  LQ  estimate,  ff  (x)dx=  ® for  all  n so  that  the  LQ  estimate 

J n 

is  not  suited  for  applications  where  one  wants  J*  |Mx)-f(x)  | dx  to 

converge  to  0 in  some  probabilistic  sense  as  n tends  to  infinity. 

The  advantage  of  the  LQ  estimate  is  that  it  is  usually  easier  to 

find  a k for  (ii)  than  an  h for  (i)  insuring  that  the  corresponding 
n n 

estimate  f is  sufficiently  smooth  and  at  the  same  time  sufficiently 
n 
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detailed.  Other  nonparametric  density  estimates  not  treated  here  include 
the  spline  method  (Wahba,  1973,  1975a,  1975b)  and  the  orthogonal 
series  expansion.  For  a survey  of  available  methods,  the  reader  is 
referred  to  Wegman  (1972). 

Let  us  formally  define  the  asymptotic  properties  that  we  will 
consider  in  this  chapter.  If  B is  a Borel  set,  then  we  say  that  {f^}  is 
a pointwise  weakly  (strongly)  consistent  estimate  for  u.  on  B if  there 
exists  a density  f for  y,  such  that 

If  (x)  - f(x)  1%  in  probability  (wpl) 

1 n 1 

for  all  x€B.  Further,  we  say  that  {fnl  is  a weakly  (strongly)  uniformly 
consistent  estimate  for  u on  B if  there  exists  a density  f for  u with 

sup  If  (x)  - f(x)  | ^ 0 in  probability  (wpl)  . 

_ _ n ' 

x€B 

If  B is  omitted,  it  is  assumed  to  be  IRm. 

In  addition  to  these  properties  we  are,  for  kernel  estimates, 
interested  in  the  convergence  to  0 of 

(|fn(z)  - f(z)||r=  l (_/ifn(x)  - f(x)|rdx)1/r  , 0 < r < oo 

( ess  sup  |f  (x)  - f(x)  | , r = ® 

x 1 n 

where  the  essential  supremum  is  with  respect  to  the  Lebesgue  measure. 
4.2  Auxiliary  Results 

Let  Y , . . . , Y be  independent,  identically  distributed  random 
In  n 

variables  with  } = 0 and  let  = V Y^  The  main  tools  needed 

below  are  inequalities  linking  E { | S^/n  | ] and  pUsn/nl  2?  e}  to  the 
various  statistics  of  Y^ . 


1 
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Lemma  4.1.  If  Y takes  values  in  [a,b]  with  probability  one,  and  if 

2 2 

g=b-a,  a = E{Y^  },  then,  for  all  e>0. 


P{  | S^/ n | ^ e}  s2e 


-2n(e/g)' 


(4.3) 


and 


P(|S  /n | * <}  s2e-n(e/9)h+»2/9*)  ln(l+g e/a2)  -l) 


<;  2e 


-ne2/(2a2+ge) 


(4.4) 


We  remark  that  (4.3)  is  due  to  Hoeffding  (1963)  and  that  (4.4)  is 
usually  attributed  to  Bennett  (1962).  The  inequality  on  the  right  hand 
side  of  (4.4)  is  trivial  if  one  notices  that  ln(l+u)  > 2u/(2+u)  for  all 
u > 0. 


Lemma  4,2.  (Hoeffding  , 1963  ) . Let  k :>  n and  let  yj  be  a set 

with  y.eta.b]  for  all  i.  Let  further  g=b-a,  and 


If  Y Y are  random  variables  obtained  by  sampling  without  re- 

1 n 

placement  from  {y  yk},  then  (4.3)  and  (4.4)  remain  valid. 

It  should  be  noted  that  (4.3)  can  be  strengthened  for  the  case  of 
sampling  without  replacement  (Serfling,  19  74  ). 


Lemma  4.3.  If  Y takes  values  in  [a,b]  with  probability  one,  and  if 
g=b-a , o2  = E{YJ2  } , E {Y  ] = 0 and  r 2 1 , then 

E{|Sn/n|r}  s rr(r/2)(g2/2n)r/2  (4.5) 

and 

E f IS  /n  |r  1 s rr(r/2)(4o2/n)r//2  + 2rr  (r)(2g/n)r  (4.6) 

1 1 n 1 


V 
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I 


where  r is  the  gamma  function. 

In  some  cases  we  want  to  obtain  an  upper  bound  for  the  r-th 
moment  of  that  is  a function  of  E{  |Y^  |r}.  We  have  the  following 
result  due  to  Whittle  (1960). 

Lemma  4.4.  If  E{  | |r}  < » with  r z 2 , then 

E{|Sn/n|r)  s23r/2  n"1/2  r((r+l)/2)E{|Y1|r}/nr/2.  (4.7) 


It  was  shown  by  Rosen  (1970)  that  (4.7)  is  not  very  tight  if  Y^  has  a 
distribution  that  is  close  to  the  Poisson  distribution.  He  showed  that 
the  following  lemma  can  sometimes  provide  stronger  bounds, 
i 1 2r 

Lemma  4,5.  If  E { | Y^  | 1 < ® for  some  integer  r s 1 , and  if  for  all 

1 < k sr, 


2k 


s a 


b 


for  some  a a 0,  b a 0,  then 

E { | S^/n  | 2r } s cfMax  {(a/n)r  (ab)r;  (a/n)2r  1 (ab) } (4.8) 


where  c^.  is  a constant  only  depending  upon  r.  Explicit  expressions 
for  cr  can  be  found  in  Dharmadikari  and  Jogdeo  (1969). 


4.3  Polntwlse  Consistency  of  the  Parzen-Rosenblatt  Estimator 

Let  {h^}  be  a sequence  from  (0,«)  and  let  K be  a Borel  measurable 

function  from  lRm  to  [0,<*].  Then  we  call  the  random  variable 

n 

f (x)  = n £ h -R1K{(X.-x)/h  ) 

n pi  n In 


the  Parzen-Rosenblatt  density  estimate  (Rosenblatt,  1957  ; Parzen,  1962  ). 
It  is  well  known  that  {f^}  is  a polntwlse  weakly  consistent  estimate  for 


_ d 


u on  Q(u),  the  set  of  continuity  points  of  4 (i.e.,  the  set  of  points  x 

for  which  some  density  of  4 is  continuous  a.  x) , provided  that 
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. n . 

h -40  , 

n 

(4.9) 

. m n 
nn  -*  00 
n 

(4.10) 

K is  a density  on  lRm  , 

(4.11) 

lim  ||x || m K(x)  = 0 , and 

INI-*® 

(4.12) 

sup  K(x)  < * . 

(4.13) 

x€  lRm 

(For  m=l,  see  Parzen  (1962),  Rosenblatt  (1957);  for  m>l,  see  Cacoullos 

(1965)).  Under  much  stronger  conditions  on  (h  1 and  K (e.g.  , K should 

n 

satisfy  a local  Lipschitz  condition,  hn/hn  -*  1 and  so  forth)  Van 
Ryzin  (1969)  showed  that  {f^j  is  a pointwise  strongly  consistent  estimate 
for  4 on  Q(u).  For  m=l,  Nadaraya  (1965)  shows  the  same  thing  if  in 
addition  to  (4.9)  - (4.13),  K is  of  bounded  variation  and 

2m 

e_anhn  < » for  all  * > 0 . 

Later,  Moore  and  Yackel  (1975)  mention  that  Nadaraya's  result 
remains  valid  for  mil. 

The  main  result  of  this  section  (theorem  4.2)  is  that  it  suffices 
to  add  the  condition 

00^  , m 

J e'#hn  < co  for  all  cr  > 0 (4.14) 

n=l 

to  the  list  of  conditions  (4.9)  - (4.13)  in  order  to  be  able  to  show  that 
{f^}  is  a pointwise  strongly  consistent  estimate  for  4 on  Q(u). 
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Note  that  (4.14)  Is  satisfied  if 

nh  m/log  n ^ ® (4.15) 

n 

and  this  shows  the  closeness  of  the  conditions  (4.14)  and  (4.10).  In 
fact,  we  will  prove  an  inequality  that  is  strong  enough  to  prove  both 
the  weak  and  strong  consistency  of  [f  }. 

We  say  that  (r>0)  if  ® for  some  density  (and 

thus  all  the  densities)  f of  u.  We  say  that  ufL  if  ess  sup  f(x)  < ® 

oo 

for  some  density  f of  u where  the  essential  supremum  is  with  respect 
to  the  Lebesgue  measure  on  IRm.  Before  stating  Theorem  4. 1 notice 
that 

E{fn(x)}  = E{hn'mK((X1-x)/hn)}  = yhn“mK((y-x)/hn)f(y)dy  (4.16) 

so  that  for  any  densities  f and  K,  E{tn(x)]  is  finite  almost  everywhere 
in  view  of^E {fn(x)  })dx  = E[Jfn(x)dx]  = I.  We  first  prove  the  following 
lemma  which  provides  us  with  a uniform  upper  bound  for  E{fn(x)  1. 

Lemma  4.6.  If  u,€Lf  where  l«r«®  and  if  K is  an  essentially  bounded 
density,  then  for  any  density  f for  ^ , 

supE{fn(x)l  s||f(z)||r  (||K(z)||/hnm)1/r  (4.17) 

x 

where  ||  • ||  is  defined  in  section  4.1. 

The  following  result  is  well-known  but  the  proof  is  repeated  for 
the  sake  of  completeness. 

1 

Theorem  4.1.  If  x€Q(n),  if  K is  a density  on  IRm,  if  h^  " 0 and  if 

either  y€L  or  K satisfies  (4.12),  then 
00 

|E{fn(x)}  - f(x)  | " 0 


for  any  density  f of  u that  is  continuous  at  x. 
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The  main  result  of  this  section  is  the  following  theorem. 

Theorem  4.2.  If  x€Q(y,)  and  if  (4 . 9)-(4 . 13)  hold, then  If  (x)-f(x)  | 

n n 

-*  0 in  probability  where  f is  any  density  for  ^ that  is  continuous 

at  x.If  in  addition  (4.14)  holds, then  |fn(x)-f(x)  | ^ Owpl. 

If  , then  (4.12)  is  not  needed  in  either  part  of  the  theorem. 

00 

The  proof  of  theorem  4.2  is  based  upon  Bennett’s  inequality 

(lemma  4.1).  We  remark  that  theorem  4.2  is  essentially  all  we  need 

to  know  about  f to  use  the  kernel  estimate  in  asymptotically  optimal 
n 

two-step  decision  rules.  The  following  theorems  are  rather  technical. 

In  particular,  theorem  4.4  states  that  under  the  conditions  of  theorem 

4.2,  E { I f (x)-f(x)|S}  " 0 for  all  1 s s s «. 
n 

Theorem  4.3. 

(i)  If  K is  a density  satisfying  (4.13),  if  nhnm  <*,  if  u€ Lr  where 
1 sr  s®  and  if 

nh"(ltI/r)i.  (4.18) 

n 

then 

suPE{|fn(x)-E{fn(x)}|S}"0 

x 

for  all  1 s s < oo. 

(11)  If  K is  a density  for  which 

ess  sup  K(x)  <®  and  ess  sup  ||x||mK(x)  < ® (4.19) 

where  the  essential  supremum  is  with  respect  to  the  Lebesgue  measure 

on  lRm , and  if  nh  m ^ <*>  and  xfQ(u) , then 
n 

E{|fn(x)  - E{fn(x)}|8}  % 
for  all  1 < s < » . 

By  Minkowski's  Inequality,  for  any  f and  for  all  s with  1 s s < ® , 
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(E  { | fn(x)  - f(x)  |S  }) 1/8  s (E  £ | fn(x)  - E [fn(x) } I8 }) 1/8 

+ |E{fn(x)J  - f (x)  | (4.20) 

so  that  we  can  combine  theorem  4.3  with  theorem  4.1  to  get  the 
following  theorem. 

Theorem  4.4.  IfxeQ(y,),if  {hn}  satisfies  (4.9) , (4. 10)  and  if  K is  a 
density  on  IRm  satisfying  (4.12)  and  (4.13),  then 

Ef  jfn(x)  - f(x)  Is}  3 0 

for  all  s with  Us  <®  and  for  all  the  densities  f for  u.  that  are  continuous 
at  x. 

4.4  Convergence  in  Lr  of  the  Parzen-Rosenblatt  Estimator 

We  showed  in  the  introduction  why  it  is  important  that 

]|fn(z)  - f(z)||^  ” 0 in  probability  or  wpl.  Glick  (1974)  established  the 

connection  between  the  pointwlse  consistency  of  ff  } for  w and  the 

n 

convergence  of  ||  fn(z)  - f (z)  ||  ^ to  0. 

For  u€I*2,  Nadaraya  (1963)  proved,  under  the  usual  conditions 
on  K and  {h^l  and  the  additional  condition 

nhn2m/log  n " ® , (4.21) 

2 n 

that  ||f  (z)  - f(z)|L  -♦  0 wpl.  We  will  prove,  among  other  things,  that 
n ^ 

if  (4.9)  - (4.13)  hold,  and  if  n€L2r  and  has  a density  f which  is  almost 

2r 

everywhere  continuous  on  IR  , then  E(||f  (z)  - f(z)|L  } tends  to  0 as  n 

n 

tends  to  infinity  where  r is  a positive  integer  and  f is  any  almost 
everywhere  continuous  density  for  Throughout  this  section,  by 
almost  everywhere  we  mean  almost  everywhere  with  respect  to  the 
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Lebesgue  measure  on  lRm. 

Theorem  4.5.  If  K is  a density  satisfying  (4.13),  if  {h  1 satisfies 

n 

(4.9)  - (4.10)  and  if  either  (4.12)  holds  or  y^L  , then 

00 

|| f^ (z)  - f(z)|]j  ^ 0 in  probability 

for  any  density  f for  y provided  that  at  least  one  density  of  y is  ae 
continuous.  If  in  addition  (4.14)  holds,  then  ||fn(z)-f(z) ||^  ^ 0 wpl. 

For  r > 1 , it  is  of  course  possible  that  y does  not  belong  to 
so  that  we  will  not  be  able  to  use  Glide's  theorem  (Glick,  1974). 

Let  us  first  observe  that  if  y€Lr  for  some  r with  lsrs®  , then  y€Lg 
for  all  s with  lss*r.  To  see  this,  notice  that  y€L^  (and,  in  fact, 

IlfMlIj  = 1 for  all  f)  and  that,  by  Holder’s  inequality, 

||f(z)||J  = y*fS(x)dx  ,(/r(x)dx)(s-1)/(r-1)(/(x)dx)(r-S)/(r-1) 

= (|ff(z)||J)(s'1)/(r“1)  (lsssr<®)  (4.22) 

and 

||f(z)||®  ||f(z)|!1  ||f(z)  \\JS~l)  (US<.)  . (4.23) 

To  prove  theorem  4.6,  we  first  note  that  by  Minkowski's 
inequality,  for  any  f and  r, 

||fn(z)-f(z)||r  s l|fn(z)-E{fn(z)}||r+  ||E{fn(z)}  - f(z)||r  . (4.24) 

We  have  the  following  lemmas  for  both  parts  on  the  right  hand  side  of 
(4.24). 

Lemma  4.7.  Let  K be  any  density  on  IRm,  let  hn  5 0 and  let  y have  at 
least  one  ae  continuous  density  f. 

(i)  If  y€Lr  with  lsr<»,  then 

||E{fn(*)}-f(z)||r5o 


I 
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f 

for  any  ae  continuous  density  f for  ^ . 

(11)  If  f is  a uniformly  continuous  density  for  i*,  then 

p{fnW)-iW||.5o  . 

Lemma  4.8.  Let  K be  a density  on  lRm  satisfying  (4.13),  let  nh  m S ® 

n 

and  let  r be  a positive  integer.  If  either  nfL  or  nh  5 ® , then 

r n 

E{||fn(z)  -E{fn(z)}||^}So  . (4.25) 

Lemmas  4.7,  4.8  and  equation  (4.24)  trivially  imply  the  following 
theorem . 

Theorem  4.6.  Let  K be  a density  on  lRm  satisfying  (4.13),  let  r be  a 
positive  integer,  let  {h^}  satisfy  (4.9)  - (4.10)  and  let  Then 

for  every  density  f for  n,  if  at  least  one  density  f of  is  ae 

continuous , 

E{||fn(*)-f(z)||^J  30  . 

By  Chebyshev’s  inequality,  we  have  that  under  the  conditions  of 
theorem  4.6,  for  a given  positive  integer  r, 

(|fn(z)  - f(z) ||2^  S 0 in  probability  . 

2 

Nadaraya  (1973)  proved  the  wpl  convergence  of  (If  (x)  - f(x)  |L  under 

" n L 

the  conditions  of  theorem  4.6  for  r = l and  under  the  additional  requirements 

(i)  ||x||mK(x)  -»  0 as  ||x]|  -♦  ® 

„ <4-26> 

(ii)  nh  /log  n S ® . 

n 

It  can  be  shown  that  (4.26) (1)  is  not  needed  and  that  (4.26) (ii)  can  be 
relaxed  to  (4.14)  (nh  m/log  n S ®)  in  order  to  be  able  to  conclude  that 
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2 ^ 

||fn(z)  - f(z) U2  -»  0 wpl.  The  proof  is  not  given  here  because  it  follows 
immediately  if  in  Nadaraya's  argument,  Bennett's  inequality  (lemma  4.1) 
is  used  instead  of  Prokhorov's  inequality. 

In  the  next  section,  we  will  see  under  what  conditions 
||fn(z)  - f(z)  2 0 wpl . This  result  can  then  be  used  together  with 

l|fn(z)-f(z)||rr  s||fn(z)-f(z)||1||fn(z)-f(z)||roo"1  (4.27) 

and  theorem  4.5  to  establish,  for  all  r with  Ur<«  , the  wpl  convergence 
of  ||f  (z)  - f(z)||£  to  0 as  n tends  to  infinity. 

4 . 5 Uniform  Convergence  of  the  Parzen-Rosenblatt  Estimator 

The  arguments  that  were  used  to  prove  the  theorems  in  section 

4.4  have  no  simple  extension  to  L . The  main  objective  of  this  section 

00 

is  to  present  new  techniques  to  prove  that 

sup  |fn(x)  - f(x)  | ” 0 wpl  (4.28) 

x 

for  some  density  f for  It  turns  out  that  for  a very  large  class  of 

kernels  in  R1,  whenever  (4.28)  holds  for  some  function  f,  then  f 

must  be  uniformly  continuous  (Schuster,  1969).  It  is  therefore  only 

natural  to  assume  throughout  this  section  that  y,  has  a uniformly 

continuous  density  f.  If  both  f and  K are  continuous,  then  it  is  easy 

to  see  that  (4.28)  is  a random  variable.  We  assume  throughout  that 

(4.28)  is  a random  variable  for  all  n. 

Nadaraya  (1965)  showed  that  (4.28)  holds  in  R1  if  f is  a 

uniformly  continuous  density  for  y,  if  K is  a density  of  bounded 

variation  that  satisfies  (4. 12),  and  if  h ^ 0 and  nh  ^ /log  n ^ ® . 

n n 

His  argument  is  based  on  Integration  by  parts.  Foldes  and  Revesz 
(1974) , also  for  m=l , showed  that  if  both  f and  K satisfy  a Lipschitz 
condition  and  Ef||X^||Y)  < ® for  some  y > 0,  then  (4.28)  holds  under 
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the  usual  conditions  on  (hn)  (i.e.,  (4.9),  (4.14)).  Using  the  martingale 

convergence  theorem,  Van  Ryzln  (1969)  showed  that  (4.28)  holds  for 

msl.  To  the  usual  conditions  on  K and  fh  1,  he  adds  some  smoothness 

n 

conditions  for  K and  some  rather  restrictive  conditions  for  fh  ) . 

1 n 

Among  other  things  we  will  prove  that  (4.28)  holds  for  mil  if 
f is  a uniformly  continuous  density  for  ^ , If  K satisfies  (4.11)  - (4.13) 
and  is  Lipschitz,  If  fh^"}  satisfies 

hn^0  (4.29) 

and 

nh  m/log  n 5 ® , (4.30) 

n 

and  if  E{||Xj |]Y]  < ® for  some  y > 0.  The  Lipschitz  condition  on  K and 
the  existence  condition  for  Ef||X^||Yl  can  be  dropped  and  replaced  by 

(i)  the  closure  of  the  set  of  discontinuities  of  K has  Lebesgue 
measure  0 

(ii)  K has  compact  support. 

We  will  assume  throughout  this  section  that  the  norm  on  IRm  is  ||’  ||  . 

All  the  theorems  remain  valid  for  the  norms  ||*||  , k=l,2,....  These 
norms  should  not  be  confused  with  the  norms  of  section  4.4  on  the 
space  of  all  densities  on  Rm. 

For  the  sake  of  completeness  we  state  the  following  well-known 
result  (Nadaraya,  1965;  see  also  lemma  4.7(ii)). 

Theorem  4.7.  If  K is  a density  on  lRm  and  f is  a uniformly  continuous 
density  for  y,,  then 

sup  |E{fn(x) } - f(x)  | 5 0 

x 

provided  that  hn  0 . 
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Because 

sup  |Mx)-f(x)|  s sup  |fn(x)-E Cfn(x) } | 
x x n 

+ sup  |E{Hx)  }-f(x)  | (4.31) 

x 

it  suffices  to  find  conditions  that  insure  that  sup  If  (x)-Eff  (x) } | 

' n n 1 

n x 

-*  0 wpl  .Theorem  4.8  is  an  improvement  of  a result  of  Foldes  and 

Revesz . 

Theorem  4.8.  If  K is  a density  on  Rm  with 

(i)  SUP  K(x)  < oo  , 

x 

(ii)  sup  ||x||mK(x)  < ® , (4.32) 

x 

(iii)  sup  |K(x+y)-K(x)  | s C||y||  for  some  C>0  and  all  y^R™, 

x 

and  if  f is  a uniformly  continuous  density  for  ^ with 

J ||x||Yf(x)dx  < ® for  some  y>0  ,and  if  (4.29)  and  (4.30)  hold,then 

(f  ] is  a strongly  uniformly  consistent  estimate  for  ^ . 
n 


The  conditions  that  seem  restrictive  in  theorem  4.8  are  the 
Lipschitz  condition  in  (4.32)  and  the  moment  condition  imposed  on  f. 
With  a completely  different  type  of  argument, using  approximations 
for  K and  employing  the  bound  (3.29)  for  deviations  of  empirical 
measures,it  is  possible  to  get  rid  of  these  conditions. Instead, let 
K satisfy  condition  (4.33)  given  below. 

(i)  K is  a probability  density  on  R™ 

(ii)  sup  K(x)  < « ; ' * ' 

x 

(iii)  K has  compact  support, i.e.  there  exists  a p>0  such 
that  J K(x)dx  = 1 ; 

r 

l-p/  + pJ 

(iv)  the  closure  of  the  set  of  discontinuities  of  K has  Le- 
besgue  measure  0. 
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We  remark  that  (4.33) (iv)  is  not  a very  restrictive  condition  at  all. 
We  prove  the  following  theorem. 

Theorem  4,9.  If  K satisfies  (4. 33), if  (4.29)  and  (4.30)  hold  and  If 
f is  a uniformly  continuous  density  for  u , then  {f^}  is  a strongly 
uniformly  consistent  estimate  for  ^ . 

One  of  the  conditions  in  (4.33)  is  that  K should  have  a 
compact  support. One  may  wonder  what  happens  if  K does  not  have 
a compact  support, for  example, if  K is  gaussian.In  the  next  theorem 
we  will  give  a partial  answer  to  this  question. We  will  find  that  the 
conditions  to  be  imposed  on  {h^}  depend  upon  the  rate  of  decrease 
to  0 of  the  tail  of  K. Consider  the  following  condition  to  be  used 
instead  of  (4.33)  (iii) . 


There  exists  a continuous  and  monotonically  decreasing 

function  u:[0,®)  -♦  [0,®)  with 
00 

(i)  ^ zm  1 u(z)dz  < ® and 

(ii)  K(x)  * u(||x||)  ,all  xfRm. 


(4.34) 


The  condition  (4.34)  states  that  K must  be  dominated  by  a bell- 
shaped and  lntegrable  function  u.The  condition  (4.34)(i)  implies 


that 


•lm  u(||x| 


dx  < 


and, by  the  monotonicity  of  u,it  is  not  hard  to  show  that  ||x||mu(||x||) 
-4  0 as  ||x||-»  ® .Hence,  ||xJ|mK(x)  -*  0 as  )]x])-*®,in  which  we  recognize 
the  classical  regularity  condition  for  K (see  (4 . 12))  .Although  zmu(z) 

-»  0 as  z-*  « does  not  imply  (4.34)  (i)  # it  is  a very  close  condition 
indeed. The  following  functions  of  z satisfy  the  requirements  imposed 
on  u : 

l/(l+-z)m(1+P)  and  l/((l+z)m(l+log(l+z))1+p)  ,&>0. 
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Needless  to  say, if  K satisfies  (4.34)  ,then  it  is  possible  to  find 
a continuous  inverse  u defined  on  some  compact  set  [0,a]  with 
the  properties  that  u * is  monotonically  decreasing  and  u *(z)-»  oo 
as  z -»  0. Notice  further  that  u *(z) s(aQ/z) for  some  a^  with 
0<ag<oo.We  have  the  following  theorem. 

Theorem  4.10.  If  K satisfies  (4.33)  where  (4.33) (iii)  is  replaced  by 

(4. 34) ,  if  (4.29)  and  (4.30)  hold  , if  f is  a uniformly  continuous 
density  for  y and  if, in  addition, for  all  e>0, 

nh  /(u  (eh  ))  logn  ->  ® (4.3 

n n 

then  {f^}  is  a strongly  uniformly  consistent  estimate  for  y,  . 

Remark  .Since  u *(z)s  (a^/z)^01  for  some  a0<®,it  is  clear  that 

(4.35)  is  always  fulfilled  if 

nhn^m/logn  ^ ® . (4.3 

However  .depending  upon  the  rate  of  decrease  of  K to  0 as  ||x|| -»<*>, 

weaker  conditions  than  (4.36)  can  be  obtained. For  example, if 

K(x)  s a0/||x|| 0171  where  o^l.then  the  condition 

m(l+l/ or)  n 

n h / log  n -»  00 

n 

is  sufficient  for  (4 . 35)  .Theorem  4.9  can  be  viewed  as  an  extreme 
case  where  »=«. 


4.6  The  Loftsgaarden-Quesenberry  Estimator 


In  this  section  we  study  the  asymptotic  properties  of  a ge- 
neralized version  of  the  LQ  estimate  (Fix  and  Hodges,  1951;  Lofts- 

gaarden  and  Quesenberry,  1965)  .Let  fk,  } and  {k„  } be  two  sequen- 

ln  Zn 

ces  of  positive  integers  such  that 

k,  ^ • , (4.37 

in 
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i 


1 sk[n  s k2n  in  , all  n , (4.38) 

and 

k2n/'n  0 ' (4.39) 


and  consider  the  estimate 


fnW 


ajn(J/n)(2||xj<-x||) 


, x€IR 


m 


(4.40) 


where 

(i)  a = (a,  ) is  a probability  distribution, 

n k.  n k„  n 

In  2n 

(ii)  ]|*]|  is  the  |]-||  norm.  All  the  results  of  this  section  remain 
valid  for  standard  norms  such  as  IM!^  . k=l  ,2  , . . . provided 

that  in  (4.40)  (2 HX*  - x]l)+  m is  replaced  by  the  volume  of 
the  sphere  with  radius  ||X*  — x ||  . 

(iii)  X?  is  the  j-th  nearest  neighbor  to  x among  X,  , . . . ,X 
j 1 n 

where  ties  are  broken  at  random.  Thus,  ||X^  -x||  «... 
s ||X^  — x ||  . 

Notice  that  sup  If  (x)  - f(x)  | is  a random  variable  if  f is  continuous 
1 n 1 

x 

since  it  is  possible  to  replace  the  supremum  over  IR  by  the  supremum 

over  a countable  dense  subset  of  IRm  in  view  of  the  continuity  of 

|f  - f|  . We  remark  that  (4.40)  is  a "smooth"  version  of  the  original 
• n 1 

LQ  estimate  (Loftsgaarden  and  Quesenberry,  1965) 

fn(x)  = (kn/n)(2||xj  -x„f"  . x€IRm,  (4.41) 

n 


where  (4.41)  Is  obtained  from  (4.40)  by  letting  k,  =k_  =k  . In  the 

in  2n  n 

original  paper  of  Loftsgaarden  and  Quesenberry,  the  norm  was  H*]^ 
and  the  coefficient  in  (4.41)  was  ((k^-lj/n)  but  both  changes  are 
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irrelevant  to  the  results  that  are  presented  below.  Loftsgaarden  and 

Quesenberry  (1965)  showed  that  f (x)  5 f(x)  in  probability  for  each  x 

at  which  f is  continuous  and  positive,  provided  that  k /n  5 0 and  k " <*> . 

n n 

Wagner  (19  73)  extended  this  result  to  convergence  wpl  under  the  addi- 
tional condition  (4.42) 

ao  — O'  k 

J2  e n < « for  all  a > 0 . (4.42) 

n^l 


For  m=l , Moore  and  Henrichon  (1969)  showed  that  sup  If  (x)  - f(x)  I " 0 

n 

x 

in  probability  if  f is  uniformly  continuous  and  positive  on  (-«, + <*>),  if 

k /n  ^ 0 and  if 
n 

k^/log  n ^ « . (4.43) 

Kim  and  Van  Ryzin  (1975),  also  for  m=l,  prove  the  same  thing  for  a 
slightly  different  type  of  estimate  under  essentially  the  same  conditions. 
We  remark,  once  and  for  all,  that  (4.43)  implies  (4.42)  and  that  (4.42) 
is  sufficient  for  (4.43)  if  {k^l  is  nondecreasing. 

We  will  prove  the  strong  uniform  consistency  of  ff  1 (defined 

by  (4 . 40)  for  n=l , 2 , . . .)  for  ^ on  IRm  (where  1)  If  there  exists  a 
uniformly  continuous  density  f for  y,,  if  (4.37)  - (4.39)  hold  and  if 


klnZA2n109  n 2 


(4.44) 


For  ^ln=k2n,  the  condition  (4.44)  reduces  to  (4.43).  To  prove  the 
uniform  consistency  of  {f  },  we  use  some  of  the  uniform  inequalities 
for  the  deviation  of  empirical  measures  that  were  proved  in  chapter  3. 
We  first  prove  a pointwlse  consistency  theorem  for  the  estimate  (4.40). 
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Theor  m 4.11.  Let  {k.  } and  {k„  ] be  positive  integer  sequences 

In  Zn 

satisfying  (4 . 3 7) -(4 . 39)  and!  let  {f^}  be  defined  by  (4.40)  where 
(a^}  is  an  arbitrary  sequence  of  probability  vectors  satisfying 

(4. 4 0)  (i)  .Let  xgQ(ki)  and  let  f be  a density  for  ^ that  is  conti- 
nuous at  x. 

(i)  If  sup  (k„  -k,  )<  co,  then 

2n  In 
n 

If  (x)-f(x)  | ^ 0 in  probability.  (4.45) 

1 n 

(ii)  If, in  addition , 

CO  ^ _ . 2 

e In  2n  < <»  for  all  a>0,  (4.46) 

n=  1 

then 

If  (x)-f(x)  I 1 0 wpl . (4.47) 

' n ' 

(iii)  If  (4.44)  holds, then  (4.47)  is  valid. 

The  pointwise  consistency  theorems  of  Loftsgaarden  and 

Quesenberry  (1965)  and  Wagner  (1973)  can  be  derived  from  theorem 

4.11  by  letting  k,  =k„  = k .The  main  result  of  this  section  is  the 
In  2n  n 

following  theorem. 

Theorem  4.12.  Let  fk.  } and  fk„  ] be  positive  integer  sequences 
1 In  Zn 

satisfying  (4. 37)-(4 . 39)  and  let  {f  } be  defined  by  (4.40)  where 

n 

{a  } is  an  arbitrary  sequence  of  probability  vectors  satisfying 
n 

(4. 40)  (i).  If  f is  a uniformly  continuous  density  for  y,  and  if  (4.44) 
holds , then 

sup  |f  (x)-f(x)  | ” 0 wpl . 

x 
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4 . 7 Proofs 
Proof  of  lemma  4.3 

We  will  make  repeated  use  of  the  gamma  function  identity 

r (a)  6 a = f ya  1 e 7//P  dy  , or  , 0 > 0 . 

J0 


Recall  that  for  a integer,  r(o)  = ( or  - 1 ) ! 

By  Hoeffding's  inequality  (4.3), 

E { j S /n|rl  = r P{|S  /n|r  >ul  du 
n */0  n 

r L-1 

= j fu2  p{  |S  /n  | > Vu}  du 

■'o 

L , 2 

„ rm  r 2 1 -2nu/g  . 

«2  / -u  e du 

*/o 

= rr(r/2)(g2/2n)r/2  . 

By  Bennett’s  inequality  (4.4), 


E{|Sn/n|r]  = / r ur_1  P{  (S^/n  | > u}  du 


S2/ 


ct2//q  r-1  -nu2/4^2 

r u e du  + 


f “ r-1 


-nu/2g 
e du 


Q.e.d. 


Proof  of  Lemma  4 . 6 

Let  lsrsoo,  let  f be  a density  for  u and  let  ufL^.  Then,  with 
11  m 

~ + — = 1,  we  have  for  all  x€  IR  by  twice  applying  Holder's  inequality, 
r s 

E{fn(x)}  =/hn  m K{(y-x)/hn)  f (y)  d y 

s l|£(z)  l!r  ||hn‘mK(z/hn)||s 

s PW|lr  K(x/hn)||J/S  ||hn-m  K(2An)[|^S-1)/S 
s pfe)|lr  <1!K(=)  l!/hnmi1"1/s 

- ||fU)||r(l|K(z)||yhnra)1/r. 

Q.E.D. 

Proof  of  theorem  4.1 

Let  x€Q(y.)  and  let  f be  a density  that  is  continuous  at  x.  Find 
6 > 0 so  small  that  |f(y+x)  - f(x)  | < e/2  for  all  y€  JRm  with  ||y||  < 6. 
Then, 

|E{fn(x)}  | «yjf(x+y)  - f (x)  | hn  m K(y/hn)  dy 
< e/2  + f(x)  f h m K(y/h  ) dy 

{y:  Ml*  M 


i 


I 


I 
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+ J h mK(y/h  ) f(x+y)dy  . 
{yr||y||^6}n  n 

If  iiCL^.then  the  two  last  terms  are  upper  bounded  by 


(f(x)  + ||f(z)  ||  ) f K(y)  dy 

00  r •'ll  II  /.i 


{y:||y|U  6/h„} 


which  tends  to  0 as  n-»  ® if  h ^0  and  K is  integrable.If  ^ e'L 


but  || x||  K(x)  0 as  ||x||-»  ® , then  we  need  only  consider 


I 


h mK(y/h  ) f(x+y)  dy 

ty'lMM)  n 


s f IMI  m(||y|l/hJmK(y/hJ  f(x+y)  dy 

{y:||y||^6  } 

-m 

s 6 sup 


n 


n 


{y:||y||^6An} 


|y||mK(y)  " 0 


Q.E.D. 


Proof  of  theorem  4.2 

Let  f be  a density  for  y,  that  is  continuous  at  x.From  theo- 
rem 4 . 1 and 


|fn(x)-f(x)|  « |fn(x)-E{fn(x)}|  + |E{fn(x) } -f(x)  | 

it  suffices  to  show  that  |f^(x)-E  {f^fx) } | 0 either  in  probability  or 

wpl  under  the  conditions  of  the  theorem. 

Notice  that  fn(x)-E  {^(x) } Y^j/n  where  are  iid 

random  variables  with  E{Y  }=0,  |Y.  | « h~m||K(z)||  wpl  and 

l 1 1 n 11  od 

E{Y,2}  s f(h  _mK((y-x)/h  ))2f(y)  dy 
1 J n n 

* hn"m||K(z)||ayhn'rnK(y/hn)f(y+x)dy  = ||K(z)||-h'mE{fn(x) }. 

In  the  proof  of  theorem  4.1, we  have  seen  that  if  °r  ||x||mK(x) 

-*  0 as  ||x||  -*  »,  and  h ^ 0 .then  Eff  (x) } *f(x).Thus  there  exists 
a constant  Cg>0  depending  upon  x such  that  for  all  n,  e{y^  } s 


jL 


1 


I 


I 


I 


jL 
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CollMII.^-By  (4.4)  .with  a 2=C0||K(z)  ^/h™  and  g=2  ||K(z)  ^/h™  . 

we  have  „ 

P{|Mx)-E{fn(x)}  | 2 e } s 2e“n®  /(2C0+2c)("K(z)'Lhn  } 

9 -Cinhm 
= 2 e 1 n 

2 

where  C^=e  /(2(C^+  e ) ||K(z)  || ^ ) . The  "in  probability"  part  of  the 
theorem  follows  from  the  arbitrariness  of  e and  (4. 10). For  the  wpl 
part, notice  that  (4.14)  implies  that 

OP 

£ P{|fn(x)-E{fn(x)}|^e}  < » 
n=l , 

for  all  e>0.The  Borel-Cantelli  lemma  then  implies  the  second  part 
of  theorem  4.2. 

Q .E.D. 

Proof  of  theorem  4.3 

To  show  (ii) , let  l£S<®  and  let  x be  a continuity  point  of  f. 

Recall  from  the  proof  of  theorem  4.2  that 

n 

fn(x)-E{fn(x)}=  (^  V/n 
i 1 1 

where  the  are  iid  random  variables  with  zero  mean,  ess  sup  Y 

- ess  inf  Y £ 2 1|  K(z)  II  /hm  and 
l ' co  n 

E{Y^ } < ||K(z)||aBh^mE{fn(x)}  £ C0||K(z)||ao/h™. 

By  lemma  4.3, 

E { | f (x)-E{f  (x)  } |S  } s s r(s/2)  (4Cn||K(z)||  /nhm)S/2 
n u oo  n 

+ 2s  r(s)  (4||K(z)||cB/nh^  )S  + 0 

in  view  of  (4.10)  and  ||K(z) || ^ < ®. 

To  show  (i),let  1 £ s< oo  and  let  (l£r£oo).By  lemma  4.6, 

E{Y,2)  o ||K(2)||_h‘mE(fn(x)} 

‘ IlKU)  II.  II «*>  llr»n'm(||K(2)  ||./h  ™ ) 1A 
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and  thus, by  lemma  4.3, 

E £ | f (x)-E{f  (x) } |S  } s 2sr(s)(4||K(z)||  /nh  m )S 
1 n n 1 " od  n 

* srls/2)  4||K(z)||i+1/r||f(2)||r/nhnma,'1/r) 

which  tends  to  0 as  n-*«  in  view  of  (4 . 10) , (4 . 18)  and 

Q.E.D. 

Proof  of  theorem  4.5 

Theorem  4.5  is  a corollary  of  theorem  4.2  and  a theorem  of 

Glick  (1974), which  states  that  if  f satisfies  some  measurability 

n 

conditions  (that  are  satisfied  here)  and  f is  a density  for  all  n, 

n 

and  f^(x)  -*  f(x)  in  probability  (wpl)  ae  (Lebesgue  measure  on  F ) , 
then 

J |f^(x)-f(x)  | dx  ^ 0 in  probability  (wpl). 

Q.E.D. 

Proof  of  lemma  4,7 

Let  f be  an  ae  continuous  density  for  ^ and  let  where 

l<r<oo  .Note  that 

||E(fn(z)}  - f(z)||^  = y*jy<f(y+x)-f(x))hnmK(y/hn)dy  | dx 

s JJ\l(y+x)-[(x)  |rhnmK(y/hn)  dy  dx 

= /hn"K(y/hn)  ^ J |f(x+y)-f(x)  | rdx  j dy 
by  Jensen's  inequality  and  Tonelli's  theorem.  But 
lim  |f(x+y)-f(x)  |f  = 0 ae 

||y||-»o 

by  the  ae  continuity  of  f. Since, by  the  ^-inequality , 

|f(x+y)-f(x)  |r  s 2r_1(fr (x+y)  + /(x)  ) 


and 
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2r  iy*(fr(x+  y)  + fr(x))dx=  2r||f(z)||^<  « , 

we  have  by  a version  of  the  Lebesgue  dominated  convergence  theo- 
rem that 

y]f(x+y)-f(x)  |r  dx  -♦  0 as  ||y||-*0. 

Find  6 small  enough  so  that 

y]f(x+y)-f(x)  |r  dx  * c/2 

for  all  ||  y || ac 6 . Then, 

||E{fn(z)}-f(z)|£  s c/2  + ( f h"mK(y/h  )dy]  2r||f(z)||'; 

\£y:||y||>  6}  / 

= c/2  + 2r||f(z)||"  / K(y)dy 

i * •'ll  II  f ^ 


{y:||y||>6/h  } 


< c 


for  all  n large  enough  since  ucL  ,K  is  a density  and  h -*  0 .The 

r n 

first  part  of  lemma  4.7  follows  from  the  arbitrariness  of  e. 

Tor  the  second  part  of  lemma  4. 7, let  f be  a uniformly  con- 
tinuous density  for  ^ . Then, 


||E£fn(z)}-f(z)||<B  * sup  |E{f  (x)}-f(x) 

x 

s sup  f |f(y+x)-f(x)  I hnmK(y/hn)dy 
f («/2)h  mK(y/h  )dy+  f 

4y||<«)  n £y:lh 


y||a6An} 


|| f(z)  || (y)dy 


where  6>0  is  so  small  that  sup  |f(y+x)-f(x)  | < c/2  for  all  y with 

x 

||y || < 6 .Thus , since  K is  a density,  and  h -♦  0,we  have  that 

H ii  oo  n 

||E {f  (z)J-f(z) ||  < c 

n ® 

for  all  n large  enough. The  second  part  of  lemma  4.7  follows  from 
the  arbitrariness  of  c.  Q.E.D. 
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Proof  of  lemma  4.8 

Let  r be  a positive  integer. Note  that  by  Jensen’s  inequality 
and  by  Tonelli's  theorem, 

^{||fn(2)-E{fn(2)}||2r)j2r  * E(||fn(z)-E{fn(z>]|£) 

= /e{  |f  (x)  — E £ f (x))|2r}dx  . 
j k 1 n n 1 

Now,  f (x)-E{f  (x)}  = Y.  )/n  where  the  Y,  are  iid  random 
n n 1 4—4.  i i 

variables  with  E {Y^  3=0 . Furthermore , for  all  k with  lsksr, 

E{|Y1|2k3  = E{|h^m(K((X1-x)/hn)-E{K((X1-x)/hn)])  |2k] 

< 22k_1  (E{  |h“mK((X1-x)/hn)  |2k)  -h  |E£h"mK((X1-x)/hn)  } |2k) 

s 22k  E{\h~nmK((Xx-x)/hn)  |2k} 

s 22k  (h'm||K(z)||  )2k_1E{h“mK((X,-x)/h  )}  . 

II  ct  n l n 

By  Rosen's  theorem  (lemma  4.5)  there  exists  a universal  con- 
stant cr  only  depending  upon  r such  that 

E£  |fn(x)-E(fn(x)}  |2r}  * crMax((2||K(z)||ynh^)r(2E{fn(x)})r; 

(2  ||K(z)||s#/nh^)2r_1(2E{fn(x)}))  . 

If  ^fL^then  we  have  by  Tonelli's  theorem, 

E{||fn(x)-E{fn(x)}||2rr  } s cr(2  ||K(z)||ynh^)r2y(E{fn(x)})rdx 

+ 2c  (2  ||K(z) II  /nhm)2r_1  " 0 
r " "o»  n 

in  view  of  (4 . 10) , (4 . 13)  and 

y*(E(fn(x)})rdx  i.  J^hnmK((y-x)/hn) /(yjdydx 

=yV’(y)dy  = ||f(z)||rr  < «. 
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If  u^L.  but  nh^2  1//r^  ^ ®,then  we  have  by  Toneili's  theorem, 

E C II  fn  (z)  ~ E £ f n ( z ) 3 1|  2 r } *cr<2  ||K(2)||flB/nh^)r2rd!K(z)||o>/h™)r-1 

+ 2cr(2  ||K(z)||o)/nh^)2r'1  3 o. 

Q.E.D, 

Proof  of  theorem  4 . 7 

Let  e>0  be  arbitrary  and  pick  6>0  so  small  that 
|f(x+y)-f(x)  |<  e/2  whenever  ||  y||  < 6 .Then, 

sup  |E{fn(x)} -f(x)  | s sup /*|f(x+y)-f(x)  | h_mK(y/h  )dy 

V v **  ' ' 


s e/2  + f 

^ •'ii 


(sup  f(x)  ) K(y)  dy  < e 


{y:j|y||s6/hn]  x 

for  all  n large  enough  in  view  of  sup  f(x)  < ® , jK(y)dy  < ® and 


h 3 0 . 

n 


Q.E.D. 


Proof  of  theorem  4.8 

Let  S(x,r)={y  : ye  JR™ , ||y-x|]  « r } where  x^IR™  and  r^O.Let 
further  K^sup  K(x),K2=  sup  ||x||mK(x)  , K3=  sup  f(x)  and 

x X X 

K4=  2y  J||x||Yf(x)dx.Let  e>0  be  arbitrary. Define  the  positive  number 
sequences  {a^}  and  {b^ } and  the  integer  sequence  {k^}  as  follows. 
Let 


an  " 116  KlK/,hn  )1/V  ■ 

bn  - . h"  Vec  . 


and 


k = [ 2 a /b  ] 
n n n 


where  C is  the  constant  of  (4 . 32) (lii)  and  [u]  stands  for  the  smallest 
integer  not  smaller  than  u. 

First  find  (k  )m  points  y.  ,i=l , . . . ,k  m , all  belonging  to 
n i n 
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S(0,a  ) such  that  for  all  x in  S(0,a  ) there  exists  a y with 
n n i 

Ily.-xlKb^. Clearly, 

P {sup  |f  (x)-E{f  (x) } | ^ e ) « P{  sup  | f (x)-E{f  (x) } |;»e  } 

X X:  l|x||>an 

+ p{  sup  |f  (x)-E{f  (x) } I Sr  e 1 
ii  ii  ' n n 

x:||x||«n 

and 

p{  sup  |f  (x)-E{f  (x) } 1 2 £ } 

*11*11  «n  " 
k 

s V P{  sup  |f  (x)-E{f  (x) ) | a e } 
f=l  xcS(y.  ,bn) 

s k™  sup  P{  sup  |f  (x)-E{f  (x) } |a  c } 

lsjsk  x€S(y  ,b  ) 
n J n 

*k“  sup  j P{  sup  |f  (x)-f  (y  ) \>  e/3} 

lsjsk  I x€S(y  ,b  ) 
n j n 

+ P{  sup  |E{f  (x)J-E(f  (y.)]|a  e/3} 
x€S(y.,bn)  n n J 

+ Pd^O^-y^/h,)  " E(K((X.-y.)/hn)})/nh^|s€/3]J. 
i 1 , 

Note  that 


sup 

x*S(y.  ,bn) 


|fn(x)-f„(yjM 


* h m sup  |K((z-x)/h  ) - K((z-y  )/h  )| 

n (x,z)gS(yj  ,bn)xlR 

* C b /hm¥l  = e/8  < «/3 

n n 


and  that 

sup  |E  {f  (x) } -E{f  (y  ) ) | < E{  sup  |f  (x)-f  (y  ) | } 

xfSty^b^  xeS(yj(bn) 

< e/3  . 


71 


We  thus  have  that 

P{  sup  |f  (x)-E{f  (x)}  Is  e } 

x:||x||*an  n 

1 kn  , “’m  l,tl|:(lC((X1-yj)/hn)-E{K((Xl-1rj)An)})/nh”| 

11  * 
n • ' 

2:  e/3  } . 

Consider  the  iid  random  variables  Y , ...,Y  where 

1 n 

Y.  =(K((X.-yj)/hn)-E{K((X.-y.)/hn)})/h^  ,lsisn, 

9 m 

and  note  that  E{Y  }=0,E{Y  )^K  L/h  , and  ess  sup  Y - ess  inf  Y 

i l ion  i l 

s K./h  . Thus , by  Bennett’s  inequality  and  k sl+2a  /b  , we  have, 

In  n n n 

P{  sup  |f  (x)-E{f  (x)  } | 2 c } 
x:  ||x||sa 

n „ . m 

s (1+  2 a /b  )m2  e *n  n 
n n 

where 

Cj  = (e/3)2/(2K1K3+K1s/3)  . 

Next, 

P{  sup  |f  (x)-E {f  (x) } | 2 e } 

x:  ||xll>an 

s P { sup  |f  (x)  e/2  } + P { sup  f(x)  s e/4  ] 

x: llxll>an  n x: llx!l>an 

+ p{  sup  |f(x)-E{Mx) } j 2 e/4  }. 
x 

The  last  probability  on  the  right-hand  side  is  0 for  all  n large 

enough  by  theorem  4, 7. The  second  probability  on  the  right-hand  side 

is  0 for  all  n large  enough  since  the  uniform  continuity  of  f and 

the  integrability  of  f imply  that  f(x)"*0  as  ||x||-»®,and  since  a * <*>  . 

n 

Further, 


i 


w 
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P{  sup  f (x)a*/2J 


x:  x >a 


1 1 

s P{  sup  (1/nh  m)  Y] 

...  II II.  _ n . *777. 


x:  I x >a 


n 


K((X.-x)/h  ) >e/4  } 
i:||X.-x||>an/2  1 n 


n 


+ P{  ®.UP  (1/nhn  > X,  K((X  -x)/h  ) s e/4  } . 

x: ||x||>a  n i7j]x.-x||s:an/2  1 n 


s P{(l/nhm)  nhm  K_/(a  /2)m  2 e/4} 
n n 2 n 


+ P{(l/nh 


"'Ifx, 


S' a /2 
i"  n 


^6/4). 


The  first  probability  is  0 for  all  n large  enough  since  a -♦  <x> . Further, 

n 


PtllXjIM/Z  ) 4 En|X1l|Y}/(an/r'Y  = K/anY=  eh^/161^ 

so  that  p{(l/nhm)  V K 2 e/4  } 

n ^11^/2 

‘ ’’E(It||X.||  = an/2  ,-PUIX.IU  V2  >)  /n  2 Sh>Kl  1 

-C2nhm 
s e 4 n 

by  Bennett's  inequality  where  C2=e/16Kj  .Thus, combining  bounds, 
we  have  for  all  n large  enough, 

P [sup  |Mx)-E{fn(x)  } J 2 e } 
x 

-C,nhm  ..  1+m+m/v.  m -C,nhm 

s e 2 n + (1+  C3/hn  Y)  2e  1 n 


where 


C,  = (16 C/ * )(16 K ,K  / c ) 
3 14 


i/y 


Obviously , (4. 30)  implies  that  ^Plsup  |f  (x)-E{Mx) ) | 2 e}<< 

n=  1 x 

for  all  e>0  so  that, by  the  Borel-Cantelli  lemma, 

sup  If  (x)-Eff  (x)}|U  0 wpl . Finally , (4 . 28)  follows  from  (4,31) 
n n ' 

x 


and  theorem  4.7, 


O.E.D. 


A 


! 


V 
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Proof  of  theorem  4.9 

To  prove  theorem  4. 9, we  need  the  following  auxiliary 
result. Let  K be  a Borel  measurable  function  on  lRm  satisfying 

(i)  OsK(x)sM  ,all  xg  IRm  . 

(ii)  K(x)-»0  as  j|x||-»®. 

(iii)  If  D={x:xeRm,K  is  not  continuous  at  x]  .then  the  closure 
of  D (denoted  by  D)  has  Lebesgue  measure  zero. 


If  a rectangle  is  a product  of  m finite  intervals , then  the  following 
is  true. For  all  e>0  and  6>0  there  exist  integers  N ,N2  and  rec- 
tangles A2 ^ , Bj Bn  and  positive  numbers  aN^P 

such  that  with 

K*(x)  = g •*€«  ■ 

we  have, 

(iv)  |K* (x)-K(x)  |<  e except  on  a set  S. 

n2  n2 

(v)  Sc  u B.  where  U B.  has  Lebesgue  measure  smaller 

i=I  1 i=l  1 

than  6 . 

(vi)  0<K*(x)sM  for  all  x. 

r 1 ^ 

(vii) K*(x)=0  outside  a compact  set  l-p,  + pJ  . 

To  see  this, let  A = [-p,fp]m.By  (ii), choose  p such  that  K(x) 

P 

< s on  (A  )C  where  (.)C  denotes  the  complement  of  a set.  Let  v be 
P — 

the  Lebesgue  measure  on  lRm.Note  that  v(DnAp)s  v(D)  = 0. Because 

DnA  is  measurable , there  exists  an  open  set  O with  Df\A  cOand 

p ^ 

v(0)<6.Since  1R  is  a separable  metric  space, we  can  find  open 

QD 

rectangles  R ,R-  , . . . with  OU  R .Now,(R  } is  an  open  covering 
1 1 i=l 

for  the  compact  set  DnA  so  that, by  the  Heine-Borel  property, we  can 
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find  a finite  subset  B ,...,B  of  {R. , i=l , 2 , . . . } satisfying 
- N2  A 2 1 

DfjA  c u B = E. Clearly, A flEC  is  a compact  set  and  since  K is 
P i=l  1 P 

c c 

continuous  on  D ,K  must  be  uniformly  continuous  on  A f|E  .Since 

c p 
A nE  is  a finite  number  of  rectangles , it  is  possible  to  partition 
P c 

Apf)E  into  a disjoint  union  of  N^  rectangles  each  with  diameter 

smaller  than  some  6>0  where  @ is  picked  so  small  that  in  each 

c N1 

rectangle,  sup  K(x)  - inf  K(x)  < e .Thus, A n E = U A.. Note  that  one 


i=l 


i 


can  always  choose  N £ (2N9+2)m+ (1+  2p/e)m.Pick  one  x.  in  every 

i A 1 

A.  and  let  a.=K(x  ) .Clearly, 0sa.  sM  , IsisN,  .Let  the  a and  A 
i ii  l 1 i i 

define  K*.By  the  disjointness  of  the  A. , OsK*  (x)  SM . Further  K*=0 

c 

on  A and  |K(x) -K* (x)  | < e except  possibly  on  a set  S. Since  K(x) 

Q 

< e on  A , it  is  clear  that  Sc  A .By  construction  we  know  that 
cPc  N2  P 

Sc  (A  nE  ) .Therefore  ScE=  U B.  and  v(E)<6  .proving  (iv-vii) . 

P i=  1 1 


By  (4.31)  and  theorem  4.7  we  need  only  show  that 

sup  |f  (x)-E{f  (x)  } | lowpl.We  will  show  that  there  exist  C^>0, 
x 

C2>0,C2>0  such  that  for  all  n large  enough, 


C, 


P {sup  | f^(x) -E  (fn(x) } 6 } s C ^ n e 


-C,nh 


m 


Of  course , theorem  4.9  follows  then  immediately  by  (4.30)  and  the 
Borel-Cantelli  lemma. 


.m 


Let  us  first  define  A,  . where  A is  a Borel  set  from  ]R 

(x,a) 


m 


xgF  and  a>0  : 


A = {y:  yg lRm , y=x+at  for  some  tgA}  . 

(x , a) 

Next, let  K =sup  K(x) , K = sup  ||x||mK(x)  and  K.=sup  f(x). Choose  6>0 
1 Z v5 

XX  X 

so  small  that  6<e/l?K  K and  choose  6>0  so  small  that 
>m 


0<e/8K  (2  p)  where  p is  defined  in  (4.33) (iii) 

v) 
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Because  K satisfies  the  conditions  (i-iii),we  know  that  given  6>0 
and  e>0  there  exist  integers  N^^  and  rectangles  , 

B ,...,B  and  numbers  a , ...,a.T  such  that  the  estimate 
1 N„  IN 

N1 

K*(x)  = £ a.  I 

1 {x$A.} 


i=l 


satisfies 


1)  |K*(x)-K(x)  q except  on  a set  S. 

N2 

2)  Sc  (j  B.  , which  has  Lebesgue  measure  smaller  than  6 

i=l  1 


and  is  contained  in  t-p,-t-p]m. 


3)  K*=0  on  ([-p,  + p]m)C. 

4)  OsK*(x)s;K  for  all  x€IR 

N 2 1 

Let  T=  u B.  .Then  we  have, 
i=l  / 1 

sup  |fn(x)-E  £f  (x)  } | 


m 


= sup  | fh  mK((y-x)/h  ) dF  (y)  - fh  mK((y-x)/h  ) dF(y) 
J n n n J n n 

3* 

*y:  sup  u (x) 

i i ..  * 


i=  1 


where 


and 


U , (x)  = |/h"IV((y-x)/h  )dF  (y)  -/h'mK*((y-x)A  )dF(y)  | , 
i J n n n y n n 1 

U„(x)  = /h"m  |K((y-x)/h  )-K*((y-x)/h  ) |dF(y) 
i J n n n 1 n 

U3(X)  = fh n™  |K((y-x)/hn)-K*((y-x)/hn)|dF(y)  . 

Let  A*=[-l , + l]m  , B*  =A*C  B*  =A*  nS^  h , and  B*  =S.  . . 

1 (x,ph  ) 2 (x,  ph  ) (x,h  ) 3 (x,h  ) 

n n n n 

Now,  3 

sup  U3(x)  sup  J hnm  |K((y-x)/hn)-K*((y-x)/hn)  | dF(y)  . 


Let  v 


Let  G 

i 

tance 

m 2 ph 
Next, 


and 


„m 

denote  the  Lebesgue  measure  on  IR  .Then  we  have 

sup  U3(x)s  0+  eK3hnmv(A*x  ph  ))  + K1K3hnmv(S(x  h }) 
x n n 

s 0K„h"m(2  ph  )m  + K.K_6hm  h'm  < e/8  + e/12  . 

3 n n 1 3 n n 

be  the  class  of  all  rectangles  from  IRm  with  maximal  dis- 

n 

between  any  two  points  in  any  rectangle  in  being 

.Note  that  all  A..  . ,,B.,  . . and  A . belong  to  G . 

n i(x,h_)  x(x,h^)  lx,  pry  n 


n 


n 


sup  U (x)  = sup  \£  VvT^V.h  >W(Ai(x,h  )})  I 
x x i=  1 n n 

4 E1  K!h/suP  k(Al(x.h  ))''*(Al(x,h  )'  I 
1=  1 x n n 

s N K h m sup  |u  (A  )-p,(A  ) | 

1 1 n VGn  n 
3 

sup  U (x)*  V sup  f h m |K((y-x)/h  )-K*((y-x)/hn)  |dFn(y) 
x 2 i=l  x *V  n 


-m 


s Otsup  K ho">  |T  )>-»<Vh  )' 

x n n 


+ ,)  * e h~m  sup  un<A(x>  ,) 

x n x n 

KiN2ymsup  Ln(AQ)-u(A0)| 

A0*Gn  _m 

+ 0h~m  sup  |un(A(X)ph  ))-“(A(Xt0h  ))l+  9hn  s^(A(x, 


« (K.N  + 9)h  m sup  |u  (A  )-^(A  ) | +■  s/12  + ©K  (2  p) 

_VGn 

< (K.N.+  9)h  m sup  Ip  (A  )-p(A  )|  + «/12 e/8. 

1 2 " A CG  " ° 

0 n 


m 


Collecting  bounds  we  see  that 
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sup  jf  (x)-E{f  (x)  ) | 
n n ' 

x 

< e/2  + (K  N.+K.N  + 8)h'm  sup  (A_)-,*(AJ  | 

-11  1 n . ' n U 0 

a.  £ u 
0 n 

and 

P{  sup  |f  (x)-E{f  (x)  } J s e ) 
n n 

x 

sP{sup  In  (A  W(A 0)  I ^ h™c A 

A fG  n U n ^ 

0 n 

where  C4=  e/Z^Nj+K  N2+  0 ) .Note  that 

sup  u(A)  sK„(m4ph  )m 
. _ , 2 n 

A^G 

n 

where  G’  is  the  class  of  all  rectangles  from  !R  with  maximal  dis- 
n 

tance  between  any  two  points  in  any  rectangle  in  c'  being  40mh  . 

n n 

By  (3.29), 


Pl  sup  U„(A0)-|i(A0)  I i C4h”) 

0 " . -nhm(C^/(64C,+  4C,)) 

, Zm  n 4 5 4 

s 4(l+2n)  e 


+ 4ne 


-nhmCc/10 
n 5 


for  all  n with  n hmCc2  1 , C hm£  1/2  and  nhmC/(8  Cr)  2;  1 .where 
no  5 n n 4 5 

C5=K2(4pm)  .This  concludes  the  proof  of  theorem  4.9. 

Q.E.D. 


Proof  of  theorem  4.10 

The  proof  of  theorem  4.9  can  be  repeated  with  a few 

changes . First  we  know  that  we  can  find  a K*  as  in  theorem  4.9 

(satisfying  the  properties  1 ) — 4)  ) .Without  loss  of  generality , we 

can  choose  a p large  enough  so  that 

/m  — 1 , . , , 2 m — 1 

z u(z)  dz  < (c/16K2)/2 

p m 

Of  course, with  6<e/12K1K2  and  0<e/(8K3(2p)  ),we  have  upon  in- 
spection of  the  proof  of  theorem  4.9 
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sup  |f  (x)-E{f  (x)}  | s V sup  U.  (x) 
- n n i=l  i 1 


X 


£ sup 


7 

J.  C 


x \ A 


hnmK((y-x)An)  clF  (y) 


'I 


(x,  phn) 


h K((y-x)/h  ) dF (y)  | 4-  e/2 
n n 1 


(x,  Phn) 

+ 'K1N1+K1N2te)h'm  sup  Un(A0)-»(A)| 

VG„ 

where  N.,N„,G  are  defined  as  in  the  proof  of  theorem  4. 9. For 
i z n 

the  first  term  on  the  right  hand  side  of  the  last  inequality , we 
argue  as  follows. We  can  upper  bound  it  by 

2 sup  / h mu(||(y-x)/h  ||)dF(y) 


/ 


x -c 

(x.phn) 


/*  -m 

sup  \J  hn  u(||(y-x)Anl!)dF 
x •fc 


(y) 


X •'c 

(x,Phn) 


- f hnmu(||(y-x)/hj|)  dF(y) 
re 


(x,  phn) 


the  first  term  of  which  is  not  greater  than 

_ „ f1*  . 2m- 1 m-1  . , . 

2 K-  I 2 z u(z)dz  < e/8 


by  the  choice  of  P. Next, let  u'(x)=  u(||x||)  I . ,let 

p J 

K^=  sup  u'(x)  and  let  N be  the  least  integer  not  smaller  than 

x m ^ m 

16K„/eh  .Let  G"  be  the  class  of  all  rectangles  of  1R  with 
4 n n 

maximal  distance  between  any  two  points  in  every  rectangle  not 

greater  than  Max(2mph  , 2h  u ^K./N.)  ).Let 

n n 4 4 

Sj  N = {x:  x€Rm  ;K4(J-1)/N4  < u'(x)  ^ K4  j/N4  } 


jL 
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T,  \T  = {x:xclRm;K  fj-l)/N,  <u'(x)}  = lj4  S.  X7 


i=j  1,N4 


Approximate  u‘  by 


u"(x)  = X)  (K  (j-l)/N  )I 


4 > 


so  that  for  all  x and  n, 


| u 1 (x)  — u " (x)  I s K./N.  s ehm/16  . 
1 4 4 n 


We  will  show  that 


f h~mu(||(y-x)/hnll)dFnM 


(x,  phn) 


- f hnmu(||(y-x)A  ||)  dF(y) 

Jc 


(x,  plO 

‘ !/8  + 2K4hn"  ■“>>.„  I“„<A0>-“«V  I ' 

A p G 
0 n 

Notice  first  that  for  all  lujsN  -1  , S.  XT  =T.  ..  nTC  . XT 

4 j,N4  j,N4  ]+l,N4 

and  that  every  T . , . is  the  difference  of  two  concentric 

j.n4(x,  hn) 

nested  rectangles  of  1R  (we  use  the  continuity  and  monotonicity 


of  u'  on  £ |)x || ^ p } and  the  fact  that  u'=0  on  { jjx ]| < p } ) . Let 
T),N4(x,  hf  TJ1,N4(x,hn)n(T),N4(x.hn))C  where 

T|1.N.(x.h  )"lyi  Rm  ;K4°'1,/N4  < u(»(v->0/hnl!)  ) 


'4'  n' 


Tj  ,N  (x,h  ) A(x,Ph  ) . 

4 n n 

It  is  clear  that  both  rectangles  belong  to  G". Indeed, if  j 2 2, then 

n 
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Thus , 


TJ1,N4(x,hn)  = {y:s'«Km  :||y-*||<hn  u-1(K4/N4)  Ho;. 
SUP|/  hnmu(lly-x||/h„)apn<1')  - / h-mu(||y-x|l/h  )dF(y)| 

X A C YC 


x ac 

(x , ph  ) 
n 


(X,  phn) 


= S^P|/hru'(lly“xll/hn)clFn(y)  -/hnmu'(|ly"Xll/hn)dF(y) 

ssup[/h'nmU''((y-x)/hn)ciFn(y)  -fh'nmu"((y-x)Ain)dF(y)  | 

+ sup  /h  m | u " ( (y-x)/h  ) ~u ' ( || y-x ||/h  )|  dF  (y) 

^ " n n 

f SUP /hnm  |u"((y-x)/hn)-u,(||y-x||/hn)  | dF(y) 

' t/8  + K^hnmS“Pfi  N41,Pn,TJ.N4,x.h|i,»-^.Nj!(.h.)))|n 


4 n 


s s/8  + K.h  m sup  la  (T  . )-a(T  \ I 

4 n x;U2  n i'N4(*'hn> 

s e/8  + 2K  h m sup  111  (A  )-„(A  ) | 

A^a"  n u u 

0 n ' 

which  was  to  be  shown. 

Recalling  from  the  proof  of  theorem  4.9  that  G cG"  we 

n n ' 

collect  bounds  and  obtain 


sup  |fn(x)-E{fn(x)]  | 


3,/4  i (K1N1tK1N2te*2K4)h‘m  sup  I un<AQ) - u<AQ)  | 


P {sup  |fn(x)-E{fn(x)}|*.  } 


iP‘E“PJ“n(Ao)-“(Vl^hnm) 
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where 


^ = •/4(K1N1+K1N2+e+2K4) 


-1, 


Now,  note  that  if  c =Max(2mph  ; 2h  u (K./NJ  ) and 
, n n n 4 4 

b =(4u  (eh  /32) ) ,then 
n n 

sup  y,(S(x,2c  ))  s K„(2c  )m  £ K„hmb 
n 2 n 2 n n 

x 

for  all  n large  enough  in  view  of  the  choice  of  N^,the  monoto- 
nicity of  u and  h ^O.From  (3.29), 
n 

m 


P(  sup  Ln(A0)-^A0)  | * *5 hn  ) 


m. 


m. 


A {■  G " ~ 9 

“ -"h„  V(MV\.V4V£>  -"Wn 

s 4(lf 2n)  e + 4n e 


for  all  n large  enough  so  that 

(i)  4u  1 (K./N  ) i 4mp  , 

4 4 

(ii)  t hm  b s 1/2  , 

2 n n 

(iii)  K,  nhmb  s 1 , and 

2 n n 

(iv)  nhn2m^a  8^h>n  . 

Theorem  4.10  now  follows  from  (4.35)  and  the  Borel-Cantelli 
lemma . 

Q.E.D. 

Proof  of  theorem  4.11 

Let  x^Q(ki)  and  let  f be  continuous  at  x.Let  T^=(2 ||X*-x||)m , 
l«j*n.Thus,T  is  the  volume  of  the  sphere  centered  at  x with 
radius  ||X*-x||.  Then,  with  an  arbitrary  e>0, 

P { | fn(x)-f(x)  | >e  } 

)c  1c 

< n ®jn  |i/nTj  - f(x>  I > £? n ajne } 

^*ln  j=kln 
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S (k2n'kln+1)  SUp  P{|j/nT  -f(x)|>  «}. 
klnSjsk2n 

Let  k^sj  sk2n. Then, with  f(x)>e, 

{|j/nT.  - f(x)|>  e ) = {T.<j/n(f(x)+e)  }U  {T.>j/n(f(x)-e) } 
and  with  f(x)sc. 


{ |j/nT.  - f(x)  | > e } = {T  <J/n(f(x)+  e ) ) . 

Assume  first  that  f(x)>e.Let  Kj=(f(x)-«/2)/(f(x)-«)  ,l^=«/4(f(x)-«) , 
Kj=(f(x)+e/2  )/(f(x)+«)  and  K4=e/2(f(x)+ e ) .Let 

Yi  = 1 {(2  ||X.-x||)msj/n(f(x)-e  ) ) 'lsisn* 

It  is  clear  that  Y ,...,Y  are  iid  random  variables  and  that 
i n 

PfYj-1  ) = E{Y1}  = P{(2  HX1-xH)m<j/n(f(x)-e  ) } 

€ ( (J/n) (f(x) - «/2)/(f(x) - « ) , (J/n)(f(x)+c/2)/tf(x)-«)l 

for  all  n large  enough  (where  the  "large  enough"  does  not  depend 

upon  j ) .Indeed, since  k„  /n"  0 and  jsk„  , we  let  N be  so  large 

2n  2n  3 

that 

(k2n/n(f(x)-c)  )1/m/2 

is  for  all  n^N  smaller  than  some  6>0,where  6 is  such  that 
||x— y H< 6 implies  that  |f(x)-f(y)  |<  e/2  by  the  continuity  of  f at  x. 
Obviously , for  all  nsN.we  also  have. 


P{(2  ||X1-x||  )m  *J/n(f(x)+«)  } 

€ |(j/n)(f(x)-  c/2)/(f(x)+  * ) . (J/n)  (f(x)+  */2)/(f(x)+  c )\  . 
So, 

n 

P{Tj>J/n(f(x)-c)}  < PQT  YA<  J } 


4 


83 


* P{£  (Y  -E(Y  })<  j-j(f(x)-e/2)/(f(x)-e)  } 

i=l 

n 

* P(£  (Y.-E^iJ/tK  -(kln/n)(e/2(f(x)-e))  } 

i=l 

* p£  (Y.-E{Y.})/n<  -K2kln/n  } 

i=l 

* e-n(^kln/n)2/,2(K1t4^)k2„/n  + ^kln/n) 

S e-</k2„>(k2W9V> 

where  we  use  the  one-sided  version  of  Bennett's  inequality 
(Bennett,  1962)  which  is  applicable  because  the  Y.  are  iid  random 
variables  with  E [(Y^E^  })2  )«E  {Y^  } £ (j/n)(f(x)+  e/2)/(f(x)-e  ) 

= (K^ -i- 41^ ) j/n  £ (K^-MK^k^/n  and  eSS  SUP  Y^-ess  inf  Y^£  l.The 
inequalities  are  valid  for  naN. 

Similarly,  with  Z.-  I ((2  ||J(  J/n(f(x)t  c l5 

2 } 

we  have  that  E((Z  -Ej^})  )«E  } sKjk2n/n  <ess  suPzfess  infZ^ 

si  and  Z,,...,Z  are  independent  identically  distributed  . Thus  , by 
1 n 

Bennett's  inequality , for  all  niN, 

n 

P{T,<j/n(f(x)+e)}  t P^Z.ij) 
n 

= p£(Z.-E[Z  })aj-nE{Z  }} 

1=1 

n 

s P(V(Z.-E{Z.})sJ-j(f(x)+«/2)/(f(x)+e)  } 
n 

= P(£(Z rE{Z  ))*  Je/2(f(x)+c)  } 

1=1  1 
n 

* P{X}zrE{Zi})sK4kin/n} 
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se-"(K4kln/n)2/(2K!k2n/n  tK4kln/n) 

Se-(kLA!."K4/(!VK4"  . 

Thus  , for  all  nsN  , 

2 

sup  P { |j/nT.  - f(x)  | 2 5)s2e  *"lk  ln^2n 

k,  sjsk  ] 

In  2n 

where  C^Min  [Y^/(2  K^+  K4)  , K^/(2  1^4-  9 J^)  ).So(for  niN  and 
f(x)>  G , 

2 

P { If  (x)-f(x)|i*}  * 2(k0  -k,  -t-l)e_Clkln/k2n 
1 n zn  in 

s 2n  <^’lkln//k2n. 

The  theorem  follows  trivially . For  the  wpl  convergence  part  of 
the  theorem,  the  Borel-Cantelli  lemma  is  used.  For  f(x)se  choose 
N*  so  large  that 

(k.  /n(f(x)+  g ) ) 1/m/2 
zn 

is  smaller  than  6 for  all  n^N*.For  such  n,by  a similar  argument, 

2 

P{|fn(x)-f(x)|a«  } s (k2n-kln+l)e'C2kln/k2n 

* ne"C2kln/k2n 
where  C^K^/^K^+K^) . 

Q.E.D. 

Proof  of  theorem  4.12 

Let  e>0  be  arbitrary  and  let  K=sup  f(x), where  f is  a 

x 

uniformly  continuous  density  for  ^.Plck  6>0  such  that 
sup  sup  l , | f (y)  — f(x)  | < g/2 

X y:||y"x||<|6  7 

and  find  N so  that  for  all  n2N, 
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4 k.  /ne  <6 
zn 


which  is  possible  in  view  of  k„  /n  ^ 0. Now,  with  TX=  (2 IIX^-x! 

2n  ] 11  i 1 

_m  . ‘ 1 


m 


x€  F , l£j£n,we  have 

P {sup  | fjj(x) — f (x)  I > e } 


snP{u  U { |j/nTX  - f(x)  |>  e } } 

x k,  sjsk,  J 


In 

s nP{  u U 


2n 


{T*<j/n(f(x)+c  )}} 


x k,  sjsk 
In 


2n 


+ n P { U 


U 


x:f(x)>6  k,  sjsk 


{T  >j/n(f(x)-e ) }} 


In' 


w2n 


snP(  U U 


{TV<  j/n(f(x)+  e } ) 


x k.  sj sk- 
in 2n 

+ nP{  J U {TX>j/n(f(x)-3e/4))}. 

x:f(x)>  e k.  sj  sk  J 
In  2n 

Let  S(x,r)=  {y:yfFm  , (2||y-x|j)ms  r } where  x^F™  and  riO.  S(x,r) 
is  clearly  a product  of  m intervals  from  IR.Let 

G = [S(x,r)  : x*  lRm  , r «; 4k  /ne  } . 
n Zn 

Note  that  r can  be  considered  as  the  volume  of  the  sphere  S(x,r). 
For  all  n^N,we  have. 


U U 

X kln*j*k2n 
c U U 


{T*<j/n(f(x)+e)} 


x k sjsk  r «k  /ne 

In  2n  2n 


{ |u  (S(x,r))-^(S(x,r))  |>js/2n(f(^f  4 } 


U U t |un(S(x,r))-u(S(x,r))  |>kln  e/ZnO^+c) } 


x r*k2n/ne 

c U { |u  (A)~n(A)  |>k  e /2n(K  + e ) } 
A€Gn 
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where  the  only  hard  step  is  the  second  one. 

Because  j/n(f(x)+e)<k0  /ne<6,we  know  that  TX<  j/n(f(x)+-  e ) implies 
that  there  exists  a sphere  of  1R  centered  at  x which  contains 
at  least  j of  the  observations  X,,...,X  while  for  all  y in  the 
same  sphere , |f(y)-f(x)  |<  e/2  . Further , the  volume  of  the  sphere 
cannot  be  greater  than  k2  /ne  .Call  this  sphere  S(x,\)  and  note 
that  nn(S(x/\))sj/n  and  u (S(x, \))< (f(x)+  e/2)j/n(f(x)+  e ) if  j and  x 
are  fixed. This  explains  the  second  and  crucial  implication.  In  a 
similar  fashion, 

U U {TX>j/n(f(x)-3  e/4) } 

x:f(x)>  e k ijik  3 
in  ^n 

c U u U { [w.  (S(x,r))-n,(S(x,r))  |>j  e/4nKQ} 

x:f(x)>eklnS)^2nrS4k2n/ne 

c U U £ |u  (S(x,r))-u(S(x,r))  |^klne/4nK0} 

xrS4k2n/n€  n 

c U { |u  (A)-n(A)  |^k  e/^} 

ACG 

n 

where  we  used  the  fact  that  if  for  a fixed  j and  x,T.  >j/n(f(x)-3  e/4) , 

then  there  exists  a sphere  S(x,\)  with  X<4k2n/ne<6  and 

n (S(x,  \))<j/n  and  g,(S(x,  \))a  (f(x)- e/2)j/n(f(x)-3  c/4)a 
n 

(J/n)(l+e/4(f(x)-3e/4))  j/n)(l+  e/41^)  .Combining  bounds  yields, 

for  all  nsN  , 

P [sup  |fn(x)-f(x)  | > e } 
x 

s n(p{  U {U  (A)-^(A)  |>  (kln/n)(e/2(KD+  c) ) } ) 

A€G  n 
n 

+ P{  U (In  (A)-n(A)  |>(kln/n)(e/4K0)}}) 

A€Gn 


2nP{  U {|u  (A)-u(A)  |>(k^  /n)(e/4(lC+e))}}. 
A*Gn 


Let  ^=e/4(i^+e),I^=2mK04/e  and  K4=l£/(64IC+4iy  .Then  we  can 
reason  as  follows. The  maximum  distance  between  any  two  points 

1 /m 

in  any  rectangle  in  G is  R =(4k  /ne)  7 .Let  G'  be  the  class 

n 0 2n  n 

of  all  the  rectangles  from  1R  for  which  the  maximal  distance 
between  any  two  points  in  any  rectangle  is  2R^.Then 


*“>>,  “(A»  s V2Vm,y"'4Ic2n/nt. 

n 

From  (3. 29), lemma  3.4  and  lemma  3.6  we  have 

2n  P { U tUn(A)~u(A)|sKk  /n}} 

A€G  2 

" .2m  ”n(K2kln/n)  /(64%n/n  + 4K2kln/n) 

s 2n  4(l+2n)  e 


,m 


q 2 -nick,  /10n 

+-  8n  e 3 2n 

for  all  n large  enough  so  that 

M nK3k2n/n  * 1 

(U)  nOyc^/n)  2 Sl^kj/n 


and 


(iii)  K3k2n//n  s 1//2> 


We  remark  that  it  is  possible  to  find  such  large  n in  view  of 

k ^ao,k_  /n  30  and  k 2/k  !}  „ .Thus,  for  all  n large  enough, 

2n  2n  in  2n 

P {sup  | fn(x) - f(x)  ( > e } 

s 8n(l+2n)2me"K4kln/k2n  -t- 8n2e~K3k2n//10. 


Theorem  4,12  follows  by  the  Borel-Cantelli  lemma, k,  sk,  /lc 

2n  In  2n 

and  (4.44).  Q.E.D. 


Chapter  5 NONPARAMETRIC  DISCRIMINATION 
5 . 1 Introduction 

In  chapter  2 we  establi  shed  a Link  between  the  convergence 

of  6 to  6*  and  of  L to  L*.In  particular , theorem  2.1  is  of  general 
n n 

interest  to  the  statistician  who  wants  to  ascertain  the  asymptotic 
optimality  of  the  discrimination  rule. In  this  chapter  we  assume  that 
the  statistician  has  no  a priori  knowledge  about  the  distribution  of 
(X,  8),  that  is,  he  does  not  know  that  the  distribution  of  (X,  8)  belongs 
to  any  prespecified  parametric  family  of  distributions  .The  rules  that 
are  studied  in  this  chapter  are  therefore  referred  to  as  nonparametrlc 
discrimination  rules  .For  surveys  on  nonparametric  discrimination, see, 
for  instance, Cover  and  Wagner  (1975). For  discrimination  in  general, 
see  Duda  and  Hart  (1973) , Fukunaga  (1972), Nagy  (1968), Ho  and 
Agrawala  (1968)  and  Sebestyen  (1962). 

Assume  that  all  of  the  ^ of  (2.1)  have  densities  ^ that  are 
^-almost  everywhere  continuous . In  section  2.2  we  showed  how  to 
construct  an  asymptotically  optimal  decision  rule  using  consistent 
density  estimates  of  the  f^.In  sections  5.3  and  5.4  we  discuss  in 
more  detail  the  asymptotic  optimality  of  such  two-step  rules  if  the 
statistician  chooses  to  use  nonparametric  density  estimates  such  as 
the  kernel  estimate  and  the  Loftsgaarden-Quesenberry  estimate. A na- 
tural modification  of  these  rules  considerably  simplifying  their  for- 
mulation is  also  introduced , and  conditions  for  their  asymptotic  opti- 
mality are  given. 

It  is  quite  possible  that  the  statistician  does  not  know  that 
X has  a density  f and  that  he  does  not  want  to  make  any  assumptions 
concerning  the  i*^.In  that  case, he  is  not  sure  whether  the  two-step 
rules  with  kernel  estimates  are  asymptotically  optimal. However, even 
without  restrictions  on  the  it  is  possible  to  find  an  asymptotically 
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optimal  discrimination  rule , provided  that  there  exist  ^ -almost 
everywhere  continuous  versions  for  the  P{fr=j  |x}  , l*j*M.In  the 
next  section, we  present  a generalized  nearest  neighbor  rule  that 
is  asymptotically  optimal  in  such  discrimination  problems , which  we 
will  refer  to  as  type  c3  discrimination  problems. 

5.2  A Generalized  Nearest  Neighbor  Rule 

One  of  the  simplest  and  most  thoroughly  studied  discrimi- 
nation rules  is  the  nearest  neighbor  rule  (Fix  and  Hodges,  1951  ; 

Cover  and  Hart,  1967)  which  lets  By  x=®  if  Xi  is  the  nearest 

n' 

neighbor  to  X among  X, , . . . ,X  .It  is  well-known  that  the  nearest 

1 n 

neighbor  rule  is  not  asymptotically  optimal  (Cover  and  Hart,  1967; 

Wagner,  1971)  but  that  a modified  version, the  k-nearest  neighbor 

rule, can  be  asymptotically  optimal  if  k is  allowed  to  vary  with  n 

such  that  k ” • and  k/n  ” 0 (Fix  and  Hodges , 1951)  .It  is  this  result 

that  we  wish  to  generalize  in  this  section. 

Consider  a probability  distribution  v =(v  _,..., v ),here 

n n l nn 

called  a weight  vector  .The  rules  we  are  interested  in  can  be 
roughly  described  as  follows. A weight  v is  given  to  the  state  of 
the  i-th  closest  neighbor  to  X.The  estimate  for  the  state  of  X is  ob- 
tained by  adding  all  the  weights  for  each  possible  state  value  and 
selecting  the  one  with  the  largest  total  weight. As  we  will  see, how- 
ever, some  care  must  be  taken  in  handling  ties  in  distance. 

Let  ||.||  be  a norm  on  Fm  and  let  the  statistician  attach  num- 
bers Zi  to  the  (X4,  eA)  where  the  Z^KUn,  are  random  variables  that 

are  Independent  of  (X,8)  and  V .Let  V’=(X. , I ,Z  ) , . . . ,(X  , e ,Z  ) 

nniii  n n n 

be  the  enlarged  data. The  Z ^ are  used  to  break  ties  and  we  therefore 
require  that  all  the  Z ^ take  different  values  with  probability  one,e.g., 
Zj-1, l<i*n. Given  x£Rm,the  permutation  V^x  of  is  thus  with  pro- 
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bability  one  well-defined  .We  will  write 

V;X=<Xf'  eiX'ZlX> 

where  ||XX-x||* . . . s||XX-x||  and  whenever  ||X*-x||  ■ 

flX*  — x||  , lsi*n-l .Let  us  Include  the  Z^  in  the  definition  of  the 

conditional  probability  of  error, say  L =P { © ^ 0 I V’  } where 

n v « x n 

n 

e,.,  Y is  a random  variable  that  is  conditionally  independent  of  ® 
n' 

given  V1  and  X with 
n 

P{  V ,X=J*Vn'X^  = 6nj(Vn,X) 
n 

and  where  6 =(6  ,,...,6  ) is  our  decision  function, that  is, a Borel 

^ nM  1^. 

measurable  mapping  from  (Fmx[l , . . . ,M  }xF ) xF  to  [0,1]  such 

A 

th.'g 

To  each  (X.x,  0.X,Z.X)  the  statistician  attaches  a weight  v ,, 
ill  ni 

l<l<n,and  then  computes  the  total  weights  that  are  given  to  each 
of  the  states  J, 

His  decision  function  6 then  must  satisfy 

n 

6 (V‘  ,x)  = 0 whenever  WX  < Max  W.X  , 1 <j*M .x^F™, 

W 11  J 1 

lil*M  (5.2) 

that  is, he  must  always  choose  one  of  the  state  values  with  the 
largest  weight. We  prove  the  following  theorem. 

Theorem  5.1.  If  (2.1)  defines  a type  discrimination  problem  and 
if  { 6n 3 a decision  rule  satisfying  (5.1)-(5.2)  for  all  n, where 

sup  v " 0 , (5.3) 

i nl 
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k„/n  9 0 


and 


n 


(5.4) 

(5.5) 


> , v.So 

U+ 1 01 

n 

for  some  sequence  of  integers  {k  },then  L 3 L*  in  probability. If, 

n n 

in  addition, 


jf  vni  < . for  .11  «>0. 

then  L " L*  wpl. Remark  that 
n 


(log  n)  sup  v 
i 


n 


ni 


(5.6) 


(5.7) 


is  sufficient  for  (5.6). 


The  condition  (5.3)  insures  that  every  weight  v . is  asympto- 

ni 

tically  negligible  .while  (5.5)  makes  the  tail  of  the  weight  vector 

v^  asymptotically  negllgible.lt  seems  natural  to  attach  larger  weights 

to  nearer  neighbors  (i.e.,v  _sv  0s...sv  ) but  this  is  by  no 

ni  xU  nn 

means  necessary  to  insure  the  asymptotic  optimality  of  the  decision 

rule. Examples  of  sequences  fv  ) satisfying  the  conditions  of  theo- 

n ' 

rem  5.1  are  given  below. 

(i)  rectangular  weight  vector. Let  lsk  <n  for  all  n.and 

n 


ni 


1A_  , l*l*k 

n n 

3 , otherwise 


where  kR  " • and  kn/n  " O.To  satisfy  (5. 6), let  additionally 

/ e n < • for  all  g>0. Notice  that  with  this  choice  of  {v  ),the 

n^l  n 

decision  rule  reduces  to  the  k -nearest  neighbor  rule  (Cover  and 

n 

Hart,  1967). 

(11)  triangular  weight  vector . Let  l*kn*n  for  all  n.and  let 


■ 
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V.  *|2(k  -i+l)/(k  +k‘)  , l<tik 
ni  i n n n n 

(0  , otherwise 

where  kR  3 • and  kn/n”o.  To  satisfy  (5. 6), let  additionally 
- «k 

/.  e n < oo  for  all  a>0. 


(ill)  exponential  weight  vector.  Let 

Vni  = ^an/(1~(1+an)-ri))/^  1+an)l  'lisi*n' 

where  a €(0,.)  for  all  n.The  conditions  (5.3)-(5.5)  are  satisfied  if 
n n ^ 

«n  -*  0 and  nan^  « (to  see  this, let  kn~7n/an  in  (5.4)  and  (5.5)). 
Furthermore,  a^ogn  ^0  is  sufficient  for  (5.6)  and  (5.7). 

Let  us  briefly  comment  on  the  way  distance  ties  are  broken. 

For  convenience, we  introduced  the  random  variables  to  uniquely 

define  V’X  given  V' .However, theorem  5.1  remains  valid  if  the  de- 
n n 

cision  rule  is  modified  as  follows. Let  5*  , l<i*n,be  the  number  of 
X^'8  for  which  HX^-xll*  ||X^-x||. Thus, all  the  5*  are  positive  integer 
valued  random  variables . Let 

^ = ll  ^1  Vnk  l[\si  )1{||^-x||-||xjt-x||}  7 £ ' (5>8) 

that  is, if  UXx-x||=||Xx-x||<||Xx-x||  for  instance , then  both  (Xx,8x) 

and  (X*  ®x)  carry  an  equal  weight  (v  ,+v  J/2.  Notice  that  the  sums 
‘ * nl  m 

Wy  in  (5.8)  are  Independent  of  the  Z^'s.Thus  it  is  possible  to  write 

« as  in  chapter  2 and  to  speak  of  L =P{0„  ^ 6 |V  ) and 
V ,X  n V .X  ' n 

n n 

6 Jv„'x)* 

n n 

5 . 3 Distance  Weighted  Decision  Rules 


slties  fj , • . • , f^j  f°r  the  probability  measures  ^ , . . . , ^ ln  (2 
In  chapter  4 we  have  shown  under  what  conditions  the  Parzen- 


Assume  that  there  exist  u -almost  everywhere  continuous  den- 
*or  ***•  Probability  measures  k^*****^  ln  (2.1). 


1 
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Rosenblatt  density  estimates  f converge  to  fj,l*jsM.Therefore, 

these  density  estimates  can  be  used  to  construct  an  asymptotically 

optimal  decision  rule  by  following  the  technique  given  in  section 

2.2  (see  (2.17)-(2.18)).As  in  (2. 13), let  N,  be  the  number  of  obser- 

Jn 

vatlons  X,  from  V for  which  0=j.Let  fh  } be  a sequence  from 
in  i 1 n 

(0,»)  used  in  the  Parzen-Rosenblatt  estimator, then  the  two-step 
rule  that  uses  the  kernel  estimator  with  kernel  K is  a sequence  of 
decision  functions  6^  satisfying 

6 .(V  ,x)  = 0 whenever  wf  < Max  wf  , l<j  sM  ,x£  Rm , 

1<1SM  (5.9) 

for  all  n where 

W*  = £ K((X -x)/li  )Ifa  = M/h"  ,1S)SM. 

] i=l  jn  lBi  1 J wJn 

Some  of  the  in  probability  asymptotic  properties  of  such 
decision  rules  can  be  found  in  Van  Ryzin  (1966). If  the  second  de- 
rivatives of  the  fj  exist  and  are  continuous  and  if  the  kernel  K 
satisfies  some  additional  regularity  conditions , then  Van  Ryzin  (1966) 
discusses  the  convergence  to  0 of  P{Ln~L*>tn}  for  sequences  {*n) 

with  c hO.As  an  immediate  corollary  of  theorems  2.2  and  4. 2, we 
n 

can  state  theorem  5. 2, the  wpl  version  of  which  is  new. 


Theorem  5.2.  If  (2.1)  defines  a type  discrimination  problem, if 

K is  a bounded  probability  density  on  Rm  with  ||x||mK(x)-»0  as  ||x||-«cb 

and  if  fh  ) is  a sequence  from  (0,»)  with 
n 

. n „ 
n 


and 


nh 


m n 


n 


then  any  decision  rule  satisfying  (5.9)  for  all  n is  asymptotically 
optimal. If  in  addition 
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-anh, 


m 


23  e ~ ”'n  < 

n=l 

then  L -L*  ^ 0 wpl . 
n 


for  all  cr>0 


We  note  that  (5.9)  is  not  a very  natural  way  of  constructing 

a decision  rule  because  the  weight  that  is  attached  to  each  (X  , 0^) 

does  not  only  depend  upon  X -x  but  also  on  N .This  is  undesirable 

i 

because  we  want  6 (V  ,x)  to  depend  upon  the  (X,  ,0.)  for  which 
n n ii 

Hx^xll  is  small. Consider  the  simpler  decision  rule  defined  as  follows. 

Let  6 be  any  decision  function  satisfying 
n 


and  where  K and  (h^)  are  as  in  theorem  5. 2. To  prove  the  asymptotic 
optimality  of  such  decision  rules  we  can  employ  theorem  2.1.  How- 
ever .using  an  argument  that  differs  only  in  the  details  from  the 
proof  of  theorem  5.1, it  is  possible  to  obtain  upper  bounds  for 

P{L  -L*> « } . 
n J 

Theorem  5.3.  If  (2.1)  defines  a type  discrimination  problem, if  K 

is  a bounded  ,lntegrable  mapping  from  Rm  to  [0,»)  with  ||x||mK(x)-*0 

as  ||x||-*«  and  if  (h  ) is  a sequence  from  (0,«)  with  h " 0 and 
. mn 

nhR  ^ then  any  decision  rule  satisfying  (5.10)  for  all  n is 
asymptotically  optimal. If  in  addition 

• hm 

V e-*  n < • for  all  «r>0 
n=l  n 

then  L -L*  _♦  0 wpl. In  particular, for  every  i>0  there  exist  constants 
n 

K1>0'N1>0  (both  t,ePendln9  upon  c ,(2.1)  and  K)  such  that  for  all 
naNj  , 


t 
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P{Ln-L*sc}  s.  (2/c)e"Klnhn 


m 


(5.11) 


We  remark  that  the  inequality  (5.11)  is  only  of  academical 
Importance  because  we  can  find  distributions  (2 . 1)  defining  type  c 
discrimination  problems  such  that  is  arbitrarily  small. 


1 


5.4  Two-Step  Rules  With  Loftsgaarden-Quesenberry  Density  Estimates 

Consider  the  two-step  rule  that  is  obtained  by  using  the 

Loftsgaarden-Quesenberry  density  estimates  f.  , l*jsM,in  (2.17)- 

J ^ 

(2. 18).  All  we  need  to  characterize  this  rule  is  a norm  || . ||  on  Fm 

and  a sequence  fk  } of  integers  with  l*k  sn.Let  N,  denote  the 
n ' n Jn 

number  of  observations  Xj  for  which  e^j.In  particular , among  those 

X.  ,look  for  the  c.  -th  nearest  neighbor  X*'^  to  x where  xeFm  and 

1 )n  c)n 

X 

c.  =k.t  ,and  let  U be  the  Lebesgue  measure  of  the  sphere  cen- 
Jn  N jn 

n x j 

tered  at  x with  X ' on  its  surface. Let  {6  ) be  a decision  rule 
c.  1 nJ 

Jn 

satisfying 

6 .(V  ,x)=  0 whenever  wf<  Max  W* , l*jseM,xgFm,  (5.12) 

n)  " 1 HUM  1 


for  all  n where 


-UUM- 


(5.13) 


In  particular, if  ||.||  is  one  of  the  Lr  norms  on  Fm  (l<rs« ) .then  U*n 

can  be  replaced  by  |jXX'^-x|| m,  1<J .From  theorems  2.2  and  4.11  we 

°Jn 

can  conclude, without  proof, that  the  following  is  true. 

Theorem  5,4.  If  (2.1)  defines  a type  cj  discrimination  problem, if 
{6^}  is  a decision  rule  satisfying  (5. 12)— (5 . 13)  for  all  n and  if 

k ” m and  k /n  " 0 
n n 

then  Lr-L*  !*  0 in  probability. If  in  addition 


(5.14) 


e a,Cn  < • for  ail  o>0 


(5.15) 


then  L -L*  -»  0 wpl. 
n 

Although  this  decision  rule  is  computationally  simplest  Is 

unnatural  because  the  N.  , lsjsM,  directly  influence  6 through  the 

jn  n 

c,  , lsjsM.This  disrupts  the  local  characteristics  of  the  rule. The 
j ^ 

natural  counterpart  is  obtained  by  replacing  the  c by  k , l*jsM. 

Jn  n 

The  resulting  decision  rule  can  be  defined  as  follows. Let  {6^}  be  a 
sequence  of  decision  functions  satisfying 

6 (V  ,x)  = 0 whenever  W*>  Min  W*,  1* j *M ,x^ R™ , (5.16) 
n)  n J lsi^M 

for  all  n where 


wf.  K’L^  -lf  V kn 
* " lf  NJn<k„- 


(5.17) 


It  is  not  hard  to  show  that  any  decision  rule  satisfying  (5 . 1 6) — (5 . 17) 
for  all  n is  asymptotically  optimal  for  all  type  cj  discrimination 
problems. To  prove  this, observe  that  for  every  x, every  event 
{||X-x||=||X  -x||},i/i,has  zero  probability. Thus, for  M=2,we  see  that 

1 X 

L =L'  wpl  where  L‘  is  the  conditional  probability  of  error  with  the 
n n n 

decision  function  discussed  in  section  5.2  with  weight  vector  v 

n 

where 

1/(2  k -1)  ,UU2k  -1 
n n 

v = 

0,2k  il<n  , 
n 

and  with  any  Z,,...,Z  , provided  that  2 k -Is  n.  Applying  theorem 
In  n 

5.1  thus  yields. 

Theorem  5 . 5 . If  (2.1)  defines  a type  cj  discrimination  problem  and 

lf  fk  } is  a sequence  of  Integers  satisfying  (5.14)  with  l<k  <n  for 
n n 

all  n,then  any  decision  rule  satisfying  (5. 16) -(5. 17)  for  all  n is 
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asymptotically  optimal.  If  in  addition  (5.15)  holds  .then  L L*  wpl 

n 

as  well. 

We  remark  that  both  classes  of  decision  rules  of  this  section 
are  not  asymptotically  optimal  for  all  type  c discrimination 

>3 

problems. Let  m=l  ,M=2  ,n  =2/3  ,n  =1/3  ,k  /n<  1/6  and  let  n,  and  y,. 

1 4 n 12 

both  put  mass  1 at  0. Then, by  Chebyshev's  inequality. 

P {N  a ;N,  ik  ) a 1-  2/4n(l/6)2  = 1-18/n.If  N.  2k  and  N0  *k  . 
in  n 2n  n in  n 2n  n 

then  wf=W^  =0  wpl.  We  pick  6 such  that  6 =1  whenever  W* =W* . 

12  n n2  12 

Since  L*=^,we  have  that 


L 2 I 


CN  2k  *N  ik 
l"ln  n'N2niKJ 


n 


and 


P{L  -L*>  £}  s PIN,  2k  ;N  sk  } i 1-  18/n  " 1. 

1 n 3J  1 In  n 2n  nJ 

A final  remark  is  in  order  here. A comparison  of  the  conditions 

of  convergence  in  the  theorems  5.3  and  5.5  shows  that  k plays  the 

m n 

role  of  nh^  .Of  course, this  is  not  a total  surprise . Indeed , for  a given 

x^F  and  kernel  K(y)=I^j|^||^1  ^ , the  decision  functions  satisfying 
(5.10)  define  a majority  rule, that  is, for  each  n,  0 equals  j if  j 
is  the  state  of  the  majority  of  the  6^  for  which  1^-  X || sh^.The  num- 
ber of  observations  for  which  ||X  -X||sh  is  probabilistically  propor- 
tional to  nhm.With  the  Loftsgaarden-Quesenberry  version  (5. 16), the 
n 

same  majority  rule  is  obtained  except  that  the  number  of  observations 
influencing  ®v  x is  now  probabilistically  proportional  to  k^. 


5.5  Proofs 

Assume  throughout  that  the  distribution  (2.1)  of  (X,  0 )is  such 
that  there  exist  ^ -almost  everywhere  continuous  versions  Pj'*..'PM 
in  (2. 6). We  first  prove  three  lemmas  that  are  of  general  Interest  in 
type  C3  discrimination  problems  .Consider  first 
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g(x,T})  = P{||X-x||*  71 } 

where  71>0,||.||  is  a norm  of  Rm,X  is  a random  vector  taking  values 
in  Rm  with  probability  measure  u ,and  xgRm.The  following  is  true. 

Lemma  5.1.  For  all  7]>0,  g(x,7l)  is  a Borel  measurable  function  of 
x.and 

lim  P {g(X, 7)) sb}=0  . 
b-*0 

Proof  of  lemma  5.1 

It  suffices  to  show  that  P(g(X,71)=0}=0.  Let 
S(x,t)=  {y  : y€Fm,!|y-x||*t } 

and  note  that  A=  {x  : g(x,71)=0  } = {x  : P{X€S(x,7]  )}  = 0 } .Since  Rm 
is  separable, there  exists  a countable  dense  subset  D of  Rm. Since  D 
is  dense, we  can  find  for  each  x in  A a d(x)gD  such  that  d(x)  p 
S(x,T)/3)  .But 

Ac  U . S(y,T/2) 

{y  : y€D,y=d(x)  for  some  xgA} 

which  is  a countable  union  of  y,-null  sets. This  proves  the  second 
part  of  lemma  5.1. To  prove  that  g(x,71)  is  a Borel  measurable 
function  of  x,we  argue  as  follows. Let  jjm  be  the  a -algebra  of  the 
Borel  sets  of  Rm  and  let  B2m  be  the  product  a -algebra  of  B™  and 
gm. Consider  the  product  probability  space  (R2m , ft2™  , and  the 
following  Borel  set  in  ft2m, 

B * {(x,y)  : x,y€Rm,||y-x||sc  7)  } . 

We  know  that  all  the  sections 

B « {(x,y)  : x,y€Rm,x=2,||y-x||<71  ) ,zgRm, 

are  Borel  sets  of  gm  (Loeve,  1963  ,pp.  134)  and  that  is  a Borel 

measurable  function  of  z (Loeve,  1963  ,pp.  135) . Q.E.D. 
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Another  quantity  of  some  practical  interest  is 

r(x,t))  = inf  |||y-x||  : |pi(y)-pi(z)  |^T1  for  some  IsisM  and  some 
y ,zgRm  with  ||z-x||s||y-x||) 

where  xgR™  and  T)>0.Note  that  r depends  upon  the  particular 

version  (p pM)  that  is  chosen  in  (2. 6). It  is  clear  that  for  any 

x.y^R  and  any  T|>0, 

|r(x,  Tl)-r(y,  T|)  | * II  y-x|| 

so  that  r is  a Borel  measurable  function  of  x.  Furthermore , the 
following  is  true  in  view  of  the  Lebesgue  dominated  convergence 
theorem . 

Lemma  5.2.  For  all  T|>0, 

lim  P{r(X,Tl)  * b ) = 0 
b-*0 

provided  that  the  p^.^.p^  in  (2.6)  that  are  used  in  the  definition 
of  r are  ^-almost  everywhere  continuous. 

Proof  of  lemma  5.2 

Let  C be  the  subset  of  Rm  on  which  all  the  p^l^isM,  are 

continuous. Since  PfXfC}  = 1 by  our  choice  of  p,  , . ,.,p.,  , we  have 

1 M 

by  the  Lebesgue  dominated  convergence  theorem, 

lim  J y,(dx)  = lim  P{XfC  ;r(X,T))<  b ) 

b-*0  {x:  x€C,r(x,T))sb  } b-»0 

= P(X€C;  r(X,n)*0  ) = P{0)  = 0. 

Q.E.D. 

Another  important  tool  is  the  inequality  given  in  lemma  5.3. 

Given  V and  xgRm,let  VX=(XX  0X) , . . . ,(XX,  e*)  be  a permutation  of 
n nil  n n 

such  that  ||XX-x||  i||XX-x||*. . .<||XX-x||. Notice  that  VX  is  not 

uniquely  defined  if  ties  occur. We  assume  that  there  is  a method. 


either  deterministic  or  random,  to  break  the  ties  and  obtain  VX  from 

n 

V .For  any  tiebreaking  method , possibly  depending  upon  V , the  fol- 
n n 

lowing  is  true. 


Lemma  5.3.  If  k is  an  integer  with  l<k  <n  and  if  c>0.b>0  and 
n m n 

g(x,c)>b  where  x^F  ,and  if  k /nib/2, then 

P(||X^*||>c,Se-nb/l°n 

n 

Proof  of  lemma  5.3 

pcK  -*#><=)  *pti:itl|x  -x|Uc)<kn) 

n i=l  111  i l!  1 

1 -Ptl|Xrx||*c))<  k/n  -b,. 

Since  k^/n  - b<  -b/2  ,we  can  upper  bound  the  last  term  using 
Bennett's  inequality  (Bennett,  1962)  by 

-n(b/2)2/(2tH-b/2)  -nb/10 

e = e 

Q.E.D. 

Proof  of  theorem  5 . 1 

For  convenience, let  M=2.  Given  a version  (p  , Pg)  of  (2.6) 
such  that  Pj  and  p2  are  ^-almost  everywhere  continuous , we  can 
define  the  following  sets  .Let 

U = {x  : xf  Rm  ; pL(x)+p2(x)  = 1 } , 

D*=  {x  : xgRm  ; pA(x)  = Max  (p1(x),p2(x)  )1  ,i=l,2, 

A(a)  = {x  : xeFm  ; | p1(x)-p2(x)  | i a]  , 

and  m 

B(b ,c , T) ) = {x:  x^R  ; g(x,c)>b  ; r(x,T))>  c } 

where  a>0,b>Q,c>0  and  T)>0.Let  «>0  be  arbitrary. Then  find  an  a 
small  enough  so  that 

P{X  (T  (D1nD2)uA(a)  } < «/4 


r 
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and  find  b and  c small  enough  so  that 

P { X 4 B(b,c,a/ 4)  } < g/4  . 

1 2 

Let  G=  Un((D  OD  )uA(a))nB(b,c,a/4)  and  note  that  P{X^G}<e/2. 

If  L (.)  and  L*(.)  are  defined  as  in  (2 . 28) , (2 . 30)  by  means  of  the 
n 

same  p^.p^.then  we  have 

Ln(X)-L*(X).I{x4GJ+an(X)-L.(X))I(xtG)  . 

Taking  conditional  expectations  yields, wpl. 


Ln-L*s  P{X«TG}  + E{(Ln(X)-L*(X))I{xg  Q}  | Vn } . 

By  Markov's  inequality  and  (2 . 28) , (2 . 30)  , 

P{Ln-L*^  c }s  P{E{(Ln(X)-L*(X))I{X€G}  | Vn } * c/2} 


* (2/e)  E{(Ln(X)-L*(X))I{X€  Q } ] 

s (2/e)  sup  E{  L (x) -L* (x) } 
x*G  n 

* (2/e)  sup  | P1(x)-P2(x)  | E{  | ^W'^i^n'**  I ) 

x€  G 


i (2/e)  sup  E{  | 6*  (x)-6  (V  ,x)  | } 

x€UnA(a)nB(b,c,a/4) 

£ (2/c)  sup  sup  P{6  (V  x)<l}. 

1-1.2  x?UnDnA(a)nB(b,c,a/4) 

We  will  show  that  there  exists  an  N such  that  for  all  nsN  and  all 
x in  UnDinA(a)nB(b,c,a/ 4)  , 


P(6nj(Vn,x)>°) 


s e 


-nb/10  -(a 


+ e 


/(128+8a))/ s^p  v^ 


i=l  ,2  ; J=1 ,2  ; J/i  . 


This  then  would  imply  that  for  all  n large  enough, 

P{Ln-t*».)  i(e-»b/10+e-(»V(128*8«))/.1up  w„i^2/«) 

from  which  theorem  5.1  follows  by  the  arbitrariness  of  c and  the 
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Borel-Cantelli  lemma. 

. 1 

So, we  pick  an  x in  Uf)D  nA(a)nB(b,c,a/4)  and  we  note 


P{  6n2(Vn'x)>0} 

* P(ll^-x||>c}  + P{wJsW^;||x£-x||  sc] 
n n 

where  (k^)  is  a sequence  of  integers  satisfying  (5 . 4) —(5 .5)  .By 

lemma  5.3  and  (5.4)  we  have  k /n  < b/2  for  all  n sufficiently 

n 

large  so  that 


-nb/10 


P{  llXjf -x||  >c)  s e 


X X 

Next, introduce  the  random  variables  Y ,Y  where 

1 n 

YiC=  r{  ef=2}"  r{  ef=i }'  1<i<n* 

Note  that  on  { jjX^-xlj  <c]  , we  have 
n 

=P{e*=2|xix)  - PC  ef=i  | } 

< p (x)+a/4  - (p  (x)-a/4)  * a/2  -a=  -a/2  wpl,l*isk  , 

i i n 

where  we  used  the  fact  that  r(x,a/4)>c  and  p^xlsp^xj+a.  So, 

P{W2XsW1*;||X|t,t-x||*c}  X P(I)vnlYXiO.-||Xx-x||«c) 
n n ir=l  n 

< P C S Vni(YT-E C Vf  | XT } ) > a/8  ; II xkX -xll  sc) 

£*  n 

PCZT  \M\^)  > -a/4<-  l|X^-x||  *c] 
i=l  n 

♦PlE  » E(Y|[|xf}>./8;||X^-x||  «c)  . 

i=k  +1  n 

n 

Clearly,  | E {Y^  | X^]  | si  with  probability  one. Thus, by  (5.5), 
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< P{-(a/2)ff  v > -a/4  } 
i=l 


‘P{d-£  v )a/2<a/4}  =P{£  v >*}=0 

i=k  +1  m i=k  +1  nl 

n n 

for  all  n large  enough  by  (5 . 5) . Finally, note  that 


Notice  that  given  XX,...,XX  ,the  Y^-E{Yx|X^}  are  independent, 

zero  mean  random  variables  .Notice  further  that  given  Xx  ..,XX, 

1 n 

sup  ess  sup  (V^-E(Y^|)^)  ) £ SUp  V^ 
and 


where  the  inequalities  are  with  probability  one . Therefore , by  the  one- 
sided inequality  of  Bennett  for  independent  ,but  not  identically  dis- 
tributed random  variables  (Bennett,  1962  ;Hoeffding,  1963  ; Fuk  and 
Nagaev, 1971 ) , 
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Pt;E  vnl(Yf-^|xr))>»/8n|X1't X*) 

1-1  2 

-n(a/8n)  /((2/n)sup  v + (a/8n)sup  v ) 

£ e j ni  ^ nl 

_ e~(a2/(128+8a))/sup  vRi 

with  probability  one. Taking  expectations  and  collecting  bounds 
yields  the  inequality  given  et  the  outset  of  the  proof  of  theorem  5.1. 

O.E.D. 

Proof  of  theorem  5.3 

For  convenience, let  M=2.Find  two  n -almost  everywhere  con- 
tinuous densities  f^  and  f2  corresponding  to  ^ and  u2.Let  + 

n2f2  and  let  for  all  x^R  and  T|>0, 

r(x,  T|)  = inf  { ||y-x||  : |*T]  for  some  i=l,2  and 

some  y,zfRm  with  ||z-x||£||y-x||  } . 

By  lemma  5. 2, since  the  tt^  are  n -almost  everywhere  continuous, 

we  know  that 

lim  P {r(X,  T|)  £ b } =0. 
b-»0 

Since  r is  Llpschitz,it  is  obviously  a Borel  measurable  function  of  x. 
Of  course, we  also  know  that 

lim  P{f(X)  > b}  =0. 
b-*  • 

Define  the  following  sets, 

D1  = {x  : x€Fm,TT1f1(x)=Max(TT1f1(x),TT2f2(x))}  ,i=l,2. 

A(a)  = (x  : xCR™,  Itt^M-tt^x)  |aa}  , 

and 

B(b,c,l))  = {x  : x€Fm,f(x)«b,  r(x,T))>c  } 
where  a>0,b>0,c>0  and  T| > 0 . 


$ 
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Let  c>0  be  arbitrary. Find  an  a>0  small  enough  such  that 
P{X4  (D!nD2)UA(a)  } < e/4 

and  find  b>0  large  enough  and  c>0  small  enough  such  that 
P{Xtf  B(b,c,a/4)  } < e/4. 

Let  G=((D1nD2)uA(a))nB(b,c,a/4)  and  note  that  P{X^  G}<  e/2.If 

and  p„  are  defined  as  in  (2.15)  and  If  L (.)  and  L*(.)  are  defined 
i n 

by  means  of  these  p^,p2  (see  (2.28)  ,(2.30)) , then  we  have 

Ln(X)-L.(X)  * I{X^G)  MLn(X)-L.(X))I{xeG). 

Taking  conditional  expectations  yields, with  probability  one, 

Ln-L*  < PlXe'G}  + E((Ln(X)-L*(X))ItX€G}  (Vj  . 

By  Markov's  inequality  and  (2.28)  ,(2.30) , 

P{Ln-L**,}  * P{E{(Ln(X)-L*(X))  |Vn}i«/2) 

s (2/f)E((Ln(X)-L*(X))  I{Xe  } 

£ (2/e)  sup  E{L  (x)-L* (x) } 

„ n 
xgG 

* (2/e)  sup  |p1(x)-p2(x)  | E{|6*(x)-6nl(Vn,x)  |} 

< (2/e)  sup  E{  |«*  (x)-6  (V  ,x)  | } 

xg  A(a)  nB(b , c , a/4) 

* (2/e)  sup  sup  P{6  (V  ,x)<l}. 

1-1,2  x€DnA(a)nB(b,c,a/4) 

We  will  show  that  there  exists  an  N^O  such  that  for  all  naf^  and 
all  x in  DlnA(a)nB(b,c,a/4)  , 

K hm 

P { 6 . (V  ,x)>0}  * e'Xlnnn,i=l,2;  j-l,2;jyi, 
nj  n 

where  K^O.This  proves  (5 . 1 1)  .Theorem  5.3  then  follows  from  (5.11) 
and  the  Borel-Cantelll  lemma. 


i 
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Let  K = J*K(x)dx  ,K2=  sup  K(x)  and  K3=  sup  ||x||  K(x).Pick  an 

J XX 

x in  D nA(a)nB(b,c,a/4)  and  define 

W.  = K((Xrx)An)  (I{ei=2}-I{ei=1}).l‘i‘n. 

6n2(Vn'x)>0  ' <Pt(^  Wi)/n2  0J 

*6  P -E {W.  ))j/ni  KQa h^/8  } + P {E {Wj } i -KQa h“/8 } . 
i~  1 , 


Then, 


Further , 


EtWl/hn  ) = /h'm K((y-x)An)(tr2f2(y)-TT1f1(y))dy 

= K0^2f2^”nlfl^x))  + / ^((y-x)  An)  (n2 f2  (y)  -T1!  fi  (y»  dy 
-/h”mK((y-x)  An)  ("2  f2  M -TTi f i M) dy 


s -K  a + 


• f (a/4)h~mK((y-x)/h  )dy 

0 {/||y-x||*c} 


(w- 


hR  mK((y-x)/hn)  (n2f2  (y) -tTj  f l (y)+b)  dy 


‘ -V  + Ko 


{yT||y-x|l>c } 

a/4  + f h mK((y-x)/h  )ruf,(y)  dy 

{y:|ly-x||>c}  n 


+ b 


1 


K(y)dy  . 


{y:||y||>c/hn3 

Choose  n so  large  that  the  last  term  is  smaller  than  KQa/4  and 


that  h~  K(yAn)  * KQa/4  for  al1  ||y||>c - Thi.s  is  possible  in  view  of 
the  integrability  of  K,  ||y||mK(y)-»0  as  ||y||-*«  and  hJS  O.We  thus  have 
that  E{W^/hm)  * -Kpa/4  for  all  n large  enough  uniformly  over 

D1nA(a)nB(b,c,a/4)  .Next, by  Bennett’s  inequality  (Bennett,  1962) , 

P{£  (W4_E  (W^ } ) a nKQa  h™/8  } 

iexp|-n  (KQa  h™/8  )2/  (2  E (W2 } + (ess  sup  Wj)  KQa  h™/8  )) 


Chapter  6 DISTRIBUTION-FREE  ERROR  ESTIMATION 


6 . 1 Problem  Formulation 

After  the  statistician  has  picked  a decision  rule  {6^}  and 

after  he  has  collected  data  V ,he  is  of  course  interested  in  the 

n 

performance  of  his  rule, that  is, he  would  like  to  estimate  the  con- 
ditional probability  of  error  L =P{0„  0 I V } = E £ 1- 6 (V  ,X)  |V  }. 

n V , a 1 nJ  n0  n 1 nJ 

n 

Because  the  distribution  of  (X,  0)  is  unknown, there  is  no  way  to 

compute  L .Instead, the  statistician  will  have  to  use  the  data  V 
n n 

to  construct  an  estimate  L of  L .Ideally, he  would  like  to  obtain 

n a n 

tight  upper  bounds  for  P{|Ln-Ln|*c}  that  do  not  depend  upon  the 
distribution  of  (X,0).This  would  tell  him  how  much  confidence 

A 

he  can  put  in  his  estimate  L without  a priori  knowledge  of  (2.1). 

n 

To  illustrate  how  nontrivial  this  problem  is  consider  the 

following  example. Let  6 =(6  .,...,6  w)  be  a constant. Then  L = 

n n l nM  n 


Iv. 


can  be  estimated  by  replacing  the  v by  their  natural 


estimates  N^/n  where  the  N^  are  the  number  of  X^s  for 

which  O^j.So, 


‘ M 

L = 1 - Y*  N 6 /n 
n pf  J nj 


and 


a M 

P(|L„-Ln|,.}  *p(lg(VN) 

M 


<^PHnj-N/nl2  *1  * 2Me 


/n)  6_,  | i c } 
2 


-2n* 


by  Hoeffding's  inequality  (Hoeffding,  1963)  .This  bound  is  valid  for 
all  distributions  of  (X,  0)  and  all  constant  decision  rules. Of  course, 
constant  decision  rules  are  of  no  practical  Importance. A slightly 
better  class  of  decision  rules  is  the  class  for  which 


108 


109 


6 ,=0  whenever  N < Max  N,  ,liisM. 
n)  1 UKM  1 

Let  for  instance  M=2  and  6 ,=a  if  N =N.  where  Osasl.Then, 

nl  12  a 

Ln=TTlI{N1<N2]+TT2I{N1>N2]+(nl(1'a)+TT2a)I[N1=N2VIf  Ln  iS  the 
estimate  obtained  by  replacing  the  unknown  tt.  by  their  estimates 


Nj/n 


then 


and 


lLn~Ln  l 2 l(nrNl/")(I{N1<N2)"aI(N1*N2))  I 
‘2|VNl/nl 

p;|l.  -£  I*.)  ‘ 2e‘n«  /2. 


n n 

Thus, we  are  still  able  to  find  a distribution-free  upper-bound  for 

A 

P { I L -L  la  el  .It  is  obvious  that  this  class  of  decision  rules  can 
be  improved  upon. If  {B^.B^...}  is  a partition  of  1R  and  is 
the  number  of  (X^.e^'s  with  X.gB^  and  9.=j  ,we  could  let 


Snl'Vxl  = 


if  N,.  >N 


lk  2k 
k lf  Nlk=N2k 
lf  Nlk<N2k/ 


»x^B, 


k ' 


where  0<a^  1 ,k=l , 2 , . 


. Unfortunately, it  is  impossible  to  find 

A 

distribution-free  upper-bounds  for  P{|Ln-Ln|sc)  with  this  rule  if 

L is  obtained  from  L by  replacing  the  unknown  PfXeB.  ;6=j  } in 
n n k 3 

the  expression  of  L by  their  natural  estimates  N.,/n. Assume  that 

n Jk 

M=2,that  for  all  k,that  n2=0  and  that  P{X£Bk}=l/2n  for 

l<k<2n  and  P{XfBk)=0  otherwise. Then, 


(Nik-°) 


i n/2n  = 1/2 


and 
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This  means  that  for  every  n, 

sup  P{|L  -L  |*l/2}  « 1. 

("l "M*“l 

A 

For  every  distribution  of  (X,  0)  .however, L converges  to  L wpl. 

n n 

Indeed.it  is  not  hard  to  see  that 

P{|L  -L  |*e}  <P{£E  |P{X€B  ;0=j  }-N  ./n|*«  } 
n n k=lj=l  )K 

which  we  know  is  upper- bounded  by  C^e  2 e where  C^>0  and 

C >0  are  constants  depending  upon  c and  the  distribution  of  (X,  0) 

^ A 

(see  proof  of  (3. 31)). If  the  natural  estimate  L of  L is  not  a 

n n 

distribution-free  estimate  (in  the  sense  that  for  every  n,  there 

exists  a (r^ nM  '**]/•••'  SUCh  that  pl  lLn'^n  I2  1,/2  ^ = *)' 

then  is  it  possible  at  all  to  find  another  distribution-free  estimate 

for  L ? The  answer  is  yes. In  sections  6.2  and  6.3  we  will  con- 
i’! . 

struct  another  estimate  L of  L for  which  P { | L -L  Ijscl^CA/n 

n n 1 1 n n 1 1 v 

where  C is  a constant  which  depends  upon  M and  e only  and 
which  is  independent  of  the  distribution  of  (X,0),of  the  a^  and  of 
the  partition  {B  ,B.  , . . . } • 

l & A 

In  section  6.2  several  estimates  L are  presented  and  some 

^ A 

general  purpose  Inequalities  are  proved  relating  |L  -L  I to  6 .In 

1 n n n 

the  remaining  sections, the  error  estimation  problem  is  discussed 
separately  for  three  large  classes  of  decision  rules, 

(i)  linear  discrimination  rules, 

(ii)  two-step  rules 

and 

(ill)  linear  ordering  rules  and  (k^J-local  rules. 

All  these  rules  except  the  {kn)-local  rules  can  be  put  into  the 
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L*  although  it  is  obvious  that  P{  |Ln~L*  |;»  e } does  not  tend  to  0 
as  n-*«  uniformly  over  all  distributions  (2.1)  because  P { | L^— L*  l ^ e } 
does  not  converge  to  0 uniformly  over  all  distributions  (2.1)  .We 

A 

will  therefore  not  study  the  properties  of  as  an  estimate  of  L*. 


6.2  The  Resubstitution, Deleted  and  Holdout  Estimates 

The  oldest  estimation  technique  discussed  in  the  literature 

is  the  resubstitution  estimate  (see  Toussaint  (1974)  for  a survey 

of  the  literature  on  the  estimation  of  the  probability  of  error). The 

resubstitution  estimate  L is  obtained  by  countinq  the  number  of 
n 

errors  if  one  estimates  the  states  of  all  the  observations  X.  using 

6 and  the  data . Formally  we  have 
n 


n (i=i  ,x/  9i 

v n i 


(6.2) 


where  the  ft,  „ , Isiin,  are  .conditioned  on  V , independent 

iVi  n 

random  variables  with 


p{®v  x=j  IV  = WV 

n'  i 


(6.3) 


In  general.L  is  an  optimistic  estimate  of  L because  (X.,6.)  is 
n nil 

contained  in  the  data. By  eliminating  the  randomization  in  (6.2)-(6.3) 


we  obtain  the  strictly  better  estimate 

l„r'-  (£  u-WV*.")  /n 


(6.4) 


Notice  that  LR’=E{LR|V  } and  that  (E{LR  |V  ) )2 < E {(LR  )2  |V  } by 
n n'n  n'n  n'n 

Schwarz's  inequality. Thus , 

E«VLnR',2|Vn>  ' E‘L„2!Vn>  -2E(LnL„R'  ^ lV„> 


«E«Ln-LnR)2|Vn). 


(6.5) 
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It  should  be  noted  that  the  difference  between  both  estimates  is 
asymptotically  negligible  and  has  no  bearing  upon  distribution- 
free  results. To  see  this, notice  that, with  probability  one, 

E{(L  -LR)2|V  } = E { (I,  -LR')2|V  }+  E[(LR-LR‘)2  |V  } 

1 n n 1 n Lnn  1 n nn  1 nJ 


and 


s E{(L  -L  R ) 2 ( V } +-  I/4n 
1 n n 1 nJ 


PllLn-Ln'l2'  lVn’  * 2e‘2"‘ 


i>0 


(6.6) 


(6.7) 


by  the  conditional  independence  of  the  0,,  ,,  (given  V ) .Chebyshev's 

\'xi  n 

inequality  and  Hoeffding's  inequality  (Hoeffding , 1963) . 

More  recently  the  deleted  estimate  has  become  very 

popular  (Lachenbruch,  1967;  Cover,  1969  ; Wagner,  1973  ; Rogers  and 

Wagner , 1976  ) .Let  V , be  the  data  with  (X. , 0.)  deleted  , 1 sisn. Given 

ni  i l 

6 ,we  construct  a deleted  decision  function  t =(*6  *6  . .)  that 

n n nl  nM 

is  a Borel  measurable  mapping  from  (1R  x {1 , . . . , M })n”  xlRm  to 

[0,1]M  such  that 
M 

E - 1 • 

1=1  1 

Conditioned  on  V ,let  0,,  v ,lsi«n,be  independent  random 
n V ,x. 

ni  i 

variables  such  that  1)  given  V , and  X, , 0,,  v and  0 are 

ni  i vni'Xi  1 

independent  and  2) 
ni  i 


(6.8) 


(6.9) 


The  statistician  can  only  gain  by  choosing  6 close  to  6 although 

n n 

everything  that  will  be  said  in  this  section  carries  through  for  any 

6 and  any  s'  .In  the  special  sections  on  linear  ordering  rules  and 
n n 

two-step  rules, we  will  outline  some  very  natural  choices  of  . 

We  define  by 
n 


I 
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k 


A strictly  better  estimate  is 


/n  . 


(6.10) 


(6.11) 


D 


where  .arguing  as  with  the  resubstitution  estimate,  L =EfL  |V  1 

n 1 n 1 nJ ' 

Et(Ln-lnD')2|V„)  *E{(t„-LnD)Z|Vn}  *E((Ln-LnD,,2|Vn)+l/4n, 


and 


P[|LnD-I-nD'l*  « IV  « 2.'2"*  - e>0. 


(6.12) 


The  difference  between  both  estimates  is  asymptotically  negligible 

and  has  no  bearing  upon  distribution-free  results. 

A decision  function  6 is  symmetric  if  for  every  V and  x. 

n n 

permutations  of  the  data  leave  6 unchanged. A deleted  decision 

n 

function  T is  symmetric  if  for  any  V , and  x.T  does  not  change 
n nl  n 

its  value  if  the  (X, , 0,)  of  V , are  permuted. The  reader  will  have 
i i n i 

no  trouble  to  check  that  all  the  6 and  £ that  are  discussed  in 

n n 

this  chapter  are  in  fact  symmetric.  The  following  theorem  .implicit 

in  Rogers  and  Wagner  (1976), is  valid  for  such  symmetric  6 and 

n 

6 .It  is  the  main  tool  in  the  development  of  distribution-free  upper 

n D 

bounds  for  PflL  -L  Ise}. 

1 1 n n 1 1 

Theorem  6.1.  If  6 and  6 are  symmetric , then 
n n 

E{(L  -LD)2}  * 1/n  + 6 E { 1 6 (V  X)-6  (V.  ,X)  | } (6.13) 

nn  1 n0n  n © nl 

where  (X,  0) , (X.  , 6, ),...,  (X  ,0  ) are  independent  identically  dis- 
1 1 n n D, 

tributed  random  vectors. The  inequality  remains  valid  for  L 

n 


As  a corollary,we  have  for  ail  e>0, 

P f I VL„D  I * « ) ‘ (1/n  * 6 E 1 1 6n  6(V„  ■ » - \ 9(V„!  ■ « I ) ) / 


(6.14) 

2 


jL 
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E«Ln-Ln)  )«l/weP|,n((Vo,Wf((Vnl,Xl). 


(6.15) 


Another  historically  important  estimate  is  the  holdout  esti- 

mate  L .Given  V .let  T =(X.  , 6.),  . . . , (X  ,6  ) and  let 

n n nil  n-s  n-s 

n n 

S =(X  .,6  (X  , e ) where  lss  in-l.T  is  called 

n n-s  +1  n-s  +1  n n n n 

n n 

the  training  sequence  and  S is  the  testinq  sequence.  Given  6 ,we 

n n 

construct  a holdout  decision  function  T =C& 1>  ) that  is  a 

— n nl  _ nM 

Borel  measurable  mapping  from  (F  x{l,...,M))n  SnxFm  to  [0,l]M 

such  that 
M 


6nj  " 1 ' 


(6.16) 


Conditioned  on  T and  X .linen, let  the  0_  ,0 

n n-s  4-i  T ,X  n-s  4-i 

n n n-s  4-i  n 

n 

lilin.be  Independent  random  variables  taking  values  in  {l,...,M} 


9T  X =j  lVJ  = L(Tn'Xn  « 4.,)' 

Tn'Xn-s  +i  n nJ  n n-sn+i 
n 

lijiM  ; liiis  . 

n 


(6.17) 


The  statistician  can  only  gain  by  choosing  'S  close  to  6 although 

n n 

everything  that  will  be  said  in  this  section  carries  through  for  any 

6 and  any  ^ .The  holdout  estimate  L is  defined  as  follows, 
n n n 


L"  ‘(l  ‘l\  ,1  * V-s  i-l1)75" 

' n n-s  »i  n / 


(6.18) 


The  corresponding  estimate  without  randomization  is 


lh'  = £n  (i-r 

n ^ r 


VT  (1-**  0 (T  ,x  ..))  /s  . 

n0  Q n n-s  +1  n 

i=  i n-s  4-i  n 

n 


(6.19) 


It  is  not  hard  to  show  that  with  probability  one, the  following 
inequalities  hold  true, 


t 
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and 


L H '=  E{LH  IV  ) , 
n 1 n 1 n 

E{(L  -lV|V  1 s E{(L  -LH)2|V J 
n n ’ n n n 1 n 

* E{(L  ■ -LH’)2|Vn}  + l/4s  , (6.20) 

n n n n 

2 

P{|LH-LH  |is  |V  ] < 2e'2  V ,c>0. 
n n n 

Note  that  If  s ^oo.the  difference  between  both  estimates  becomes 
n 

asymptotically  negligible . The  following  theorem  will  be  very  useful 
in  the  development  of  distribution-free  upper-bounds  for 

ptlVLnH|*.J- 

Theorem  6.2.  For  all  decision  functions  6 and  holdout  decision 
n 

functions  'b  and  for  all  e>0, 
n 

E{(L  -LH)2}  il/2s  h 2 E M 6 (V  ,X)-6  (T  ,X)  |2} 

1 n n n 1 1 n0  n ne  n ' J 


and 


< l/2s  + 2 E { 1 6 .(V  , X) -T  (T  , X)  | } , 
n ' n9  n r>9  n ' 

2 

P{|Ln-L”|2e}s2e_Sne  /2 

M4/.2)E(|5n#(Vn,X)-rn,(Tn.X)|2) 

2 

p{lLn-LnH|>«)«  Ze'n*  /2 


(6.21) 


(6.22) 


+ (2/ *)  E £ | 6 (V  ,X)-?  (T  ,X)  |). 
1 n6  n n0  n ' 


Theorem  6.2  remains  valid  for  L 


H1 


(6.23) 


Notice  the  similarity  between  theorem  6.1  for  the  deleted 
estimate  and  the  inequality  (6.21)  for  the  holdout  estimate. Theorems 
6.1  and  6.2  imply  that  in  the  search  for  upper  bounds  for 
E((L  -LD)2}  and  E{(L  -LH)2}  it  suffices  to  find  upper  bounds  for 
E{|"n,(V»-S;,(Vn1.»|)  »nc  B(  I I J-™- 


M. 


I 


exactly  what  we  will  attempt  to  do  in  the  following  sections. 


6 . 3 Two-Step  Rules 

In  chapters  2 and  5 we  defined  two-step  rules  as  rules 
that  are  constructed  in  two  stages. In  the  first  stage  an  estimate 
is  constructed  for  some  density  or  some  atomic  measure, and  in 
the  second  stage  these  estimates  are  employed  to  find  a decision 
function. In  section  5 we  have  pointed  out  how  the  density  estimate 
can  be  modified  so  that  a more  natural  two-step  rule  is  obtained. 
These  natural  two-step  rules  are  the  object  of  this  section. We 


say  that  {6n}  is  a two-step  rule  if  all  the  Y^.  satisfy 


Y ,(V 
nj  i 


,x)  w”(X  ,x)I  . , 

" i-  1 * 1 * 


(6.24) 


where  the  wn  are  Borel  measurable  mappings  from  !RmxlRm  to  F, 


lsjiM.n*  1 . If 


wjn(y,x)=K((y-x)An)  ,x,yeFm  , 


(6.25) 


where  {h^}  is  a sequence  of  positive  numbers  and  K is  a Borel 
measurable  mapping  from  Fm  to  tO,*),then  we  obtain  the  natural 
modification  of  the  distance  weighted  decision  rule  of  section  5.3, 
In  particular, if  K(x)=I  £||x||s  1 ^ , then  we  obtain  a majority  rule  over 

all  the  (X, , 9,)  for  which  ||X-x||<h  (Fix  and  Hodges,  1951;  Sebes- 
1 i " i " n 

tyen,  1962 ) .The  histogram  decision  rule  of  section  2.3  is  also  a 
two-step  rule  with  the  definition  (6 . 24)  .Indeed , if  {B^.Bg....} 
is  a countable  partition  of  F , and  if 


w|‘(y,x)= 


(6'26) 


the  resulting  decision  function  lets  ^ x=j  if  X takes  a value  in 

n' 

B and  if  the  majority  of  the  observations  X.  with  X €B,  have 
k i l k 
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states  et=  j . 

The  property  we  will  exploit  in  this  section  is  that  every 

Y is  the  sum  of  n independent  identically  distributed  random 
nj  D 

variables, We  will  derive  upper-bounds  for  P { JL  -L  ]*e}  and 
u n n 

P f | L -L  |*«1  that  depend  upon  n,«,M  and  c where  c a 1 is 
1 ' n n 1 n n 

a characteristic  of  the  functions  w,  , lijsM.We  say  that  c is 

j n 

the  ratio  range  of  the  wj1  if  the  range  of  all  the  w^  is  contained 

in  fO]  u [a  ,b  ] for  some  0<a  sb  <®  with  b /a  =c  .The  interesting 
lJU  n n nn  n nn 

feature  of  these  bounds  is  that  they  do  not  depend  upon  the  dis- 
tribution of  (X,  0)  or  upon  the  nature  of  the  mappings  w^KjiM, 
provided  that  their  ratio  range  is  c^  or  less. We  will  see  that  for 
every  n we  can  find  simple  functions  wj1  with  a finite  ratio  range 

c and  a distribution  of  (X,  0)  such  that 
n 

P{|LD-L  1*1/3  }>  1/e2. 

1 1 n n 

So, the  knowledge  of  c is  sufficient  for  the  statistician  to  obtain 

tt  £> 

distribution-free  upper— bounds  for  P{|Ln-Ln|*e}  and  no  distribution- 

free  upper-bound  exists  that  holds  uniformly  over  all  ratio  ranges. 

The  yet  unsolved  problem  is  that  if  the  wn  are  such  that  c =«*>,can 

J D n 

one  still  find  distribution-free  upper-bounds  for  P { lL  -L  |*e} 
that  do  depend  upon  some  characteristic  other  than  the  ratio  range? 

The  situation  is  even  worse  for  the  resubstitution  estimate 

LR.We  will  see  that  for  the  most  common  mappings  w"  lijsM, 
n 

with  c =1,  it  is  possible  to  find  a distribution  of  (X,  0)  such  that 
n 

P{lLn"LJ2l/4)=  U 

Because  the  statistician  does  not  know  "j  * • • • * ^ » • • • < - he 

will  in  those  simple  cases  never  be  sure, no  matter  how  large  nis, 

R 

whether  the  resubstitution  estimate  L is  close  to  L -The  conclusion 

n n 


t 


therefore  is  that  for  two-step  rules,  the  deleted  estimate  seems  at 

this  moment  to  be  the  best  choice  as  an  estimate  of  L . 

n n 

Let  the  w.  be  as  in  (6.25)  with  K(x)=I,,.  . ,or  as  in 

(6.26)  with  any  choice  of  [B^  ,B2 ,...}.  In  both  cases  the  ratio 

range  is  cn=l  .However , for  both  trivial  examples  and  for  all  the 

possible  mappings  §n , it  is  possible  to  find  distributions  of  (X,  0) 

such  that  L -LRss  1/4. 
n n 

Theorem  6.3.  Let  w^,l<j<M,  be  as  in  (6.25)  with  K(x)=I £||x||s j ^ 

or  as  in  (6.26)  with  any  choice  of  {B,B  ,...], and  let  § be 

i « n 

arbitrary . Then  it  is  possible  to  find  a distribution  of  (X,  0)  such 

that  L -LRal/4. 
n n 

Theorem  6.3  implies  that  even  for  the  simplest  two-step 
rule, i. e. , one  for  which  all  the  w^1  take  values  in  {0, 1 } , 1 sej^M , 
na  1 , 


(TTj  , . . . , TT^j  ' Uj  ' • • • ' W-jyj) 


P{|L^-Ln|2l/4  } = l,nil.  (6.27) 


Let  us  turn  now  to  the  deleted  estimate. To  define  the  de- 
leted estimate, we  first  have  to  construct  a deleted  decision 

function  =(?  S'.,)  that  is  close  to  6 .The  statistician 

n nl  nM  n 

knows  the  wj1  and  he  is  using  .Thus , he  can  approximate  Yn  by 

r =(¥'  Y'  ) where 

n nl  nM  _ 

n 

Ynj(Vni,X)  = ^ wjn(Xk,x)If0  =j  j,l«J*M,l*i*n.  (6.28) 


Let  the  J'-th  subset  of  {1,...,M}  be  the  subset  of  indices  J for 

which  y’ ,=  Max  y' . and  let  S’ ,(V  .,x)=£  (J' ,x) .Because  the  same 

nJ  UiiM  ni  ni  ni  n 

and  C are  used  in  the  construction  of  6 and  6*  ,we  may 
J n n n 

expect  that  6 and  S'  are  close  to  each  other. The  limited  power 
n n 
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of  the  deleted  estimate  alluded  to  above  is  because  of  the  following 
property . 

Theorem  6.4.  There  exists  a two-step  rule  (6  1 with  range  wnc[0,l]# 
n j 

lsj*M, nil, such  that  for  all  n, 

Pf|LD-L  |al/3}>l/e2  (6.29) 

1 1 n n 1 

for  some  distribution  of  (X.  0). 

The  main  result  for  the  deleted  estimate  is  the  following. 

Theorem  6.5.  Let  f6  } be  any  two-step  rule  where  c is  the  ratio 
n n 

range  of  the  w^  , l4j  .Then,  for  all  «>0  and  for  all 
“1 “M* 

P(lLr?-Ln'*,}  * *"2 

where 

C2  = 4/,/n  + 4(1+  li/zju  + 64^)^ 

and  where  Cj  is  the  universal  constant  of  the  Berry-Esseen  theorem 
(e.g.  ,Zolotarev(1966)  reports  that  C^sl.322  ). 

In  the  proof  of  theorem  6. 5, we  use  a uniform  version  of  the 
Berry-Esseen  central  limit  theorem  to  show  that  for  all  distributions 
of  (X, 0) , 

E{(LD-L  )2}<  2/n  + 6C  (M-l)c  /Jn  . (6.31) 

n n l n 

We  do  not  pretend  that  the  constant  in  (6.30)  ,(6.31)  is  the 

smallest  possible  and  Indeed, it  is  possible, by  further  restricting 

the  class  of  rules  {6  },to  obtain  much  smaller  constants  .However, 

n 

it  turns  out  that  the  bounds  in  (6.30),  (6.31)  are  the  best  possible 
with  regard  to  their  dependence  on  n.In  particular, it  is  possible 
to  show  the  following. 


2/n  + 6C0(M-  l)c  /,/h; 

l n 1 


(6.30) 
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Theorem  6.6.  Let  M=2,and  0<  «<  1/2  .There  exists  a two-step  rule 

with  c =1  (nil)  such  that  for.  all  even  n, 
n 

sup  p{  |L^-Ln  |i  e } il/,y2n 


and 


sup 


’ nM  ' U1 ' 


(nl  * • • * ' ' U1 


E{(L^-Ln)2)i  1/4^2 n . 


(6.32) 


(6.33) 


Let  us  now  briefly  discuss  the  use  of  LD  for  the  distance 

n 

weighted  decision  rules  of  section  5. 3. Consider  first  kernels  K 

taking  values  0 and  1. Then. no  matter  how  the  statistician  chooses 

K.the  bounds  (6.30)  and  (6.31)  are  valid  with  c^=l .independent  of 

the  choice  of  {h  }.If  K takes  integer  values, say  0 , 1 ,2  , . . . ,N , then, 

independent  of  h and  the  actual  form  of  K.the  bounds  (6.30)  and 

(6.31)  are  applicable  with  cn=N  .Unfortunately, some  interesting 

kernels  such  as  the  gaussian  have  a compact  range  of  the  form 

[0,0]  and  by  theorem  6. 4, we  may  suspect  that  is  not  a good 

distribution-free  estimate  of  L for  such  two-step  rules. One  way 

n 

out  for  the  statistician  is  to  slightly  change  the  decision  rule. If 

range  Kc  [0,1], then  he  could  replace  K by  K’=K+dn  in  (5.21)  where 

d >0.1n  that  case, the  bounds  (6.30)and  (6.31)  are  valid  with  c = 

n n 

(1+d  )/d  .This  modified  rule  is  asymptotically  equivalent  in  form 
n n _ 

to  the  original  rule  if  d -»  0 , and  the  bound  (6.31)  is  useful  if 
2 n ^ 

nd  1 ».It  remains  to  be  shown  that  the  modified  rule  is  asympto- 
n 

tically  optimal  under  some  condition  on  {d^}  that  does  not  contra- 
dict the  condition  nd2"  ». Another  modification  would  be  to  replace 

n 

K in  (5.21)  by  K'=K  I , in  which  case  cn=l/dn  if  dn*l. 

To  illustrate  the  fact  that  other  bounds  can  be  found  if  the 
statistician  has  some  a priori  knowledge  about  the  distribution  of 
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(X,  0)  .consider  the  following  example. Let  al1  Put  mass 

0 outside  [-r,+r]m  where  r is  known  to  the  statistician. Let  K be 
a given  kernel  with  a range  of  values  on  [0,1]  and  let  the  w*1  be 
as  in  (6. 25). It  is  easy  to  see  that  (6.30)  ,(6.31)  can  be  applied 


in  this  case  with 


c = 1/  inf 
n 


KU/hn) . 


x€  [_2r  ,+2r] 


Consider  the  case  that  K is  bell-shaped, i.e.  K is  a monotoni- 
cally  nonincreasing  function  of  ||x||,say  K(x)=l/(0f Y||x||m+°^  where 
a>0, 02l,y>O.In  that  case, (6.30)  and  (6.31)  are  applicable  with 

cn=  P*y  (2rmAn)m+a. 


The  holdout  estimate  L can  also  be  used  as  a distribution- 

free  estimate  of  L .We  will  see  that  the  bounds  for  PflL  -L  lie] 
n 1 n n 1 

are  larger  than  the  ones  obtained  for  the  deleted  estimate. On  the 
other  hand, some  statisticians  may  be  attracted  to  the  holdout 

estimate  because  of  its  simplicity. Most  of  the  discussion  for  L 

H ^ 

can  be  repeated  for  L^.We  will  only  state  how  to  choose  the 

holdout  decision  function  T and  then  obtain  a distribution-free 

H 

upper  bound  for  P { JL^  — L^  | at  « } . 

Given  T and  xgRm,let  S'  (T  ,x)=?  (J’,x)  where  the  J'-th 
n n n n 

subset  of  [1 , . . . , M } is  such  that  for  every  index  j in  the  subset, 

Y\  < Max  Y'  . where  f =(*'  *'  ) is  defined  as  Y but 

nJ  1<UM  ni  n nl  nM  n"8n 

with  the  mappings  wj1, 1<J <M .Thus  , 
n-'s 


Y’  . (T  , x)  = £ " w"(X  ,x)  I.  , ,,1<J*M. 
nj  n J l i 

The  main  result  for  the  holdout  estimate  is  the  following. 


(6.34) 
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Theorem  6.7.  Let  {6^}  be  any  two-step  rule  where  is  the  ratio 
range  of  the  wj1,  l*j<M.Then  for  all  n.all  e>0  and  all  distributions 
of  (X,8), 


pnL,f-Lnia«}  «2e"sn‘ /2 


+ 2C„(M-l)c  s /Vn-s  +1 
i n n n 


and 


H 


E{(L  -L  ) }*l/2s  +2C  (M-l)c  s /./n-s  +1  . 


n n 


n n 


(6.35) 

(6.36) 


We  remark  that  the  bound  (6.35)  can  be  made  to  decrease 

as  c /n^6  for  arbitrarily  small  6>0  by  properly  choosing  the 
n 

sequence  {s^J.This  rate  of  decrease  is  thus  arbitrarily  close  to 
the  rate  of  decrease  of  the  bound  (6.30)  for  the  deleted  estimate. 


6.4  Linear  Discrimination  Rules 

A linear  discrimination  rule  {6^}  is  a rule  that  is  con- 
structed as  in  (6.1)  where 

m' 


wjX>*i<*) 


(6.37) 


where  m'll.tp  <p  , are  Borel  measurable  mappings  from 

. 0 T1  m 

Rm  to  F with  9 si, and  the  w^=(w  !!,w  , . . . ,w  , ) , 1*J*M ,are 

U J jtiji  Jm  rn'4*l 

Borel  measurable  mappings  from  (Fmx{l , . . . , M })n  to  F .We 

also  require  that  the  mappings  5 not  depend  upon  x. Linear  dis- 
crimination rules  are  thoroughly  investigated  in  the  literature  on 
parametric  discrimination  (e.g.,see  Duda  and  Hart  (1973)  or  Ho 
and  Agrawala  (1968)  for  surveys). The  basic  property  of  these 


rules  is  that  for  any  data  sequence  V .the  set  for  which 

n 


* .(V  ,x)>Y  . (V  ,x)  ,J/k,ls  a linear  halfspace  if  q>  ,9. , 
nj  n nk  n u 1 

are  considered  as  the  new  variables. 


-«P, 


m’ 
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We  have  seen  that  the  resubstitution  estimate  is  nearly 
useless  to  the  statistician  for  most  two-step  rules  .However  we 

R 

will  show  that  is  a very  good  distribution-free  estimate  of  L 
for  linear  discrimination  rules. The  qualification  "very  good" 
refers  to  the  fact  that  it  is  possible  to  obtain  an  upper-bound 
for  P{  |LR-L  Iste  } that 

(i)  does  not  depend  upon  the  distribution  of  (X,0), 

(ii)  does  not  depend  upon  the  way  of  choosing  the  w^, 
l<jasM, 0<ism' , and 

(iii)  decreases  exponentially  fast  with  n. 


n 


The  results  of  this  section  complement  the  results  of  Devroye 
and  Wagner  (1976). Using  the  bound  of  Vapnik  and  Chervonenkis 
(1971)  it  is  possible  to  prove  the  following  nontrivial  result. 


Theorem  6.8.  Let  (6  1 be  any  linear  discrimination  rule  .then, 

v n < 

for  all  e>0,all  n and  all  distributions  of  (X,0), 


No  attempt  has  been  made  to  obtain  the  best  possible 
constants  in  theorem  6. 8. The  merit  of  the  theorem  is  that  the 
bound  is  valid  for  ail  ways  of  choosing  the  w^  from  the  data, 
and  all  possible  (unknown)  distributions  of  (X,  6) . Mimicking  the 

R» 

proof  of  theorem  6. 8, we  can  also  prove  that  for  and  any 
linear  discrimination  rule, 

sup  P { |LR  -L  la  c } 

. v 1 ' n n 1 

‘ni  , . . . ,TTj^  * > • • • / 

* 6M  (l+2n«/3M)2(m'+1)Me'nt  /18  M 2 . 

* 


125 


6.5  Linear  Ordering  Rules  and  fk ^}-Local  Rules 

There  is  another  large  class  of  discrimination  rules  for 

which  it  is  very  easy  to  find  useful  distribution-free  estimates 

of  the  conditional  probability  of  error, that  is, the  class  of  £kn } — 

local  rules. First  we  will  obtain  distribution- free  upper-bounds  for 

P { |L^-Ln  € } using  techniques  that  were  suggested  by  Cover 

(1969)  and  Rogers  and  Wagner  (1976)  .Similar  new  upper-bounds 

are  obtained  with  the  holdout  estimate. We  will  see  that  the 

re  substitution  estimate  falls  as  a distribution-free  estimate  of 

L for  fk  l-local  rules  in  general  but  that  L is  a useful  dis- 
n nJ  n 

tribution-free  estimate  of  L for  fk  }-nearest  neighbor  rules 

n n 

provided  that  k grows  large.  Let  us  first  clearly  define  the 
n 

classes  of  rules  that  will  be  discussed  in  this  section. 


Let  us  enlarqe  V in  the  following  manner. Let  V' = 
n n 

(X.  , 0,  ,Z.) , . . . , (X  , e ,Z  ) where  Z,,...,Z  are  random  variables 
111  nnn  l n 

that  are  independent  of  and  have  the  property  that  equality 
between  any  two  Z 's  o*.  ;urs  with  probability  zero. For  example, 
we  can  let  Z^l/i , i*  1 ,or  else, we  can  let  Z^,...,Zn  be  indepen- 
dent all  having  a uniform  distribution  over  [0,1]. Given  x€Fm,the 
Zj  are  used  to  obtain  a wpl  uniquely  defined  permutation 

(X*.  ef.Zj*) (X*.  of  vn  with  the  property  that  ||X*-x|| 

*...*||X*-x||  and  If  ||X1x-x||.||X*  j -x||  .then  .Let  V*- 

z.x)  where  lsek  <n.We  assume  that  V' 
k n n 

n 

replaces  V in  the  definition  of  L and  that  a decision  function 
n n 

6 is  a Borel  measurable  function  of  V'  and  x.If  6 is  a Borel 
n x 

measurable  function  of  V and  x for  all  n,then  we  say  that  { 6 J 

n n 

is  a fk  ]-local  rule  where  l<k  <n  for  all  n.For  such  rules  it 
n n 

is  very  easy  to  define  a close  deleted  decision  function  6^. Let 


(Xx,  ®^,ZX) (Xfc*.  V 

n n 
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V' . be  the  sequence  that  is  obtained  by  deleting  (X.,0.  ,Z)  from 
ni  1 1 i 

X X 

V'  and  obtain  V , from  V'  , using  the  same  procedure  to  get  V 
n ni  ni  n 

X X 

from  V'  .Thus, both  V and  V . are  random  vectors  taking  values 
n n , ni 

in  (Fmx{l M}xF)  n .if  l*kn*n-l  and 

6 (V’x)  = g(V*  x)  (6.38) 

n n n n 

for  some  vector  function  g , then  let  the  deleted  decision  function 

n 

be  defined  by 

6„ (V‘  ,x)  = g (V*  x),l*i<n.  (6.39) 

n ni  n ni 

Similarly, if  l£s  sn  and  l<k  <n-s  ,and  if  T'  =(X,  , 0,  ,Z. ) , . . . , 
n n n n i i l 

(X  ,0  ,Z  ) ,we  can  obtain  T from  T'  just  as  we  ob- 
n-s  n-s  n-s  n n 

n n n 

tained  VX  from  V’  . If  6 is  defined  by  (6. 38),  then  let  the  holdout 
n n n 

decision  function  be 

Mr  ,X)  = g (Tx,x). 
n n n n 

The  crucial  observation  in  this  section  which  will  enable 
us  to  use  the  powerful  theorems  6.1  and  6.2  is  the  following. 

E 1 1 «„  . X)  | ) « E { |gn  e(VnX . XI -gn  ,<1* . X)  | ) 

* p{vnVTnx]  * sup  P{vnVrnx} 
s k x 


SUP  P{un  J1  {Zn_s  +1=zfn 
x i=l  J=1  n 


< s k /n  (6. 

n n 

for  the  holdout  decision  function  6 when  {6  } is  a (k  }-local 

n v n n 

rule. Similarly, for  the  deleted  decision  function  6 , 


E(l«n9(V;.X)-rne(v;i.X)n.kn/n. 


(6.41) 
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A simple  combination  of  (6.40)  and  (6.41)  with  theorems  6.1  and 
6.2  yields  the  following  interesting  results. 

Theorem  6. 9. If  {6  } is  any  (k  1-local  rule, then  for  all  e>0, 

n n 

P{|LD-L  Ise}  * (1+6 k )/ne2  (6.42) 

1 1 n n 1 n 

and 

Ef(LD-L  )2}  s(l+6k  )/n  (6.43) 

1 n n n 

for  all  distributions  of  (X,  9)  provided  that  l*k  <n-l. 

Theorem  6. 10.  If  {6^}  is  any  {kn3  -local  rule  and  if  l<Sn<n-kn, 
then  for  all  «>0, 

2/ 

P'|LH-L  las  } i2e  Sn*  ^ + 2s  k /ne  (6.44) 

c ' n n 1 n n 

and 

Ef(LH-L  )21  i 1/2 s Us  k /n  (6.45) 

*•'  n n i n n n 

for  all  distributions  of  (X,0). 

Several  remarks  are  in  order. The  bounds  (6.42)  and  (6.43) 

tend  to  0 as  n-»  * if  k /n  " O.Very  interestingly , it  is  always 

n 

possible  to  find  a sequence  ( s for  the  holdout  estimate  such 

that  the  bounds  (6.44)  and  (6.45)  tend  to  0 as  n-»  ® provided  that 

k /n  H 0 (just  let  s ~./nA  ).One  rule  to  which  the  above  men- 
n n n 

tioned  theorems  apply  is  the  generalized  nearest  neighbor  rule  of 

section  5.2  where  the  sequence  of  weight  vectors  v =(v  v ) 

n n i nn 

satisfies 

v = 0 ,na  1 . (6. 46) 

ni 

A special  class  of  {kn]-Iocal  rules  is  the  class  of  {k^J-nearest 
neighbor  rules. For  these  rules  we  can  considerably  improve  (6.42)- 


(6.45). 
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A fk^}-nearest  neighbor  rule  {6^}  is  a sequence  of  decision 
functions  constructed  as  (6.1)  where 

W'n'*1  - g '{<=)]  (6-47) 

Thus.Y  counts  the  number  of  (X^ef,  ZX)’s  in  VX  for  which  0X=j . 
nj  i i i n i 

We  will  let  the  deleted  decision  function  be 

T (V1  ,x)  = l (J*  ,x)  , lsisn  , (6.48) 

n ni  n 

where  the  J' -th  subset  of  {1.....M}  is  the  subset  of  indices  for 

which  Y'  = Max  Y'  . .where 

nj  IsisM  ni 

Min(k  ,n-l ) 

Y'  .(V'  ,x)  = /_]  I x ,,lsjsM(liisn,  (6.49) 

nj  ni  lei=j  f 

and  where  (XX  e*  Z.X) (X  X , 6 * ,Z  * ) is  the  ordered  per- 

111  n-l  n-1  n-l 

mutation  of  V'  .With  this  definition  we  can  define  a deleted  de- 
ni 

cision  function  for  an  n-nearest  neighbor  rule  as  well. A similar 

definition  can  be  given  for  the  holdout  decision  function. The  main 

result  for  fk  ]-nearest  neighbor  rules  is  that  both  the  deleted  and 
n 

the  holdout  estimate  are  useful  tools  for  the  statistician  to  estimate 

L .Indeed.it  is  not  very  hard  to  prove  the  following  inequalities, 
n 

all  valid  for  M=2. 

Theorem  6.11.  Let  (6nl  be  a {k^-nearest  neighbor  rule  and  let 

*>0  be  arbitrary. Then, if  lsk  <n-l  , 

n 

PflL-L  bt)  * ( 1 + 24^k  /tt  )/ne 
1 1 n n ' n 

and 

E{(L°-L  )21  * (l+24Vk  /tt  )/n 
n n n 

for  all  distributions  of  (X,  0). 
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Theorem  6.12.  Let  f6  1 be  a fk  ]-nearest  neiqhbor  rule  and  let 

n n 

« > 0 be  arbitrary. If  1 sk  *n-s  .then 

n n 2 

P{|L^-Ln|s«  } £ 2e'V  /2  + (8/Vn)snViT/nc 

and 

E{(L”-Ln)2}  s l/2sn+  (8/Vn)snV)T/n 
for  all  distributions  of  (X.0). 


Notice  that  k <n  so  that 
n 

E{(LD-L  )2}  s 1/n  + 2A/J~m\  (6.50) 

for  aU  the  [k^J-nearest  neighbor  rules  and  for  aU  the  distributions 

of  (X,  0)  .independent  of  k^.The  rate  of  decrease  of  this  bound  is 

lA/n  .We  can  show  that  over  all  the  [k  }-nearest  neighbor  rules, 

n 

this  rate  is  actually  the  best  possible  rate  of  decrease. 

Theorem  6.13.  If  M=2,nis  even  and  0<  c < 1/2, then  there  exists 

a fk  }-nearest  nearest  neighbor  rule  for  some  sequence  (k  } such 
1 n n 

that  2 — 

sup  E{(L  -L  ) l/472n 

(nl "m'^I 

and 

sup  P{|LD-L  |se}*l/,/2n. 

(tt1 nM,k*l k*M) 

p 

The  resubstitution  estimate  L is  not  useful  with  the 

n R 

nearest  neighbor  rule  (k  =1  for  all  n)  because  in  that  case  L =0. 

n n 

However, if  M=2 , tt1=tt2=  and  if  ^ and  g^are  unlform  mea8ures 

over  [0,1], then  L = £ for  all  n.Thus,  L -L  = * for  some  distribution 
n E n n 

of  (X,  9)  for  all  n.We  state  this  as  a theorem. 
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Theorem  6.14.  If  {6^}  is  a nearest  neighbor  rule  and  M=2,then 

there  exists  a distribution  of  (X , ® ) such  that  L =jt  and  LR=0  for 

n n 

all  n. 

However, for  {k^J-nearest  neighbor  rules, the  resubstitution 
estimate  is  useful  if  k^  grows  large, We  could  of  course  expect 
this  because  the  deleted  and  the  resubstitution  estimate  are  close 
to  each  other  for  large  k^.The  following  result  should  be  compared 
with  the  theorems  6.9  and  6.12  for  deleted  estimates. 


Theorem  6.15.  Let  M=2  and  let  {6^}  be  a {k^J-nearest  neighbor 
rule. Then, for  all  the  distributions  of  (X,  6), 


and 


R 


E{(L  -L  ) } i 2(1+  24 Vk  /rr  )/n  + 8/^rrk 
n n n n 


E{(LR-L  )2}  * 2 (l+6k  )/n  + 8A/nk 
1 n n J n v n 


The  {k^J-nearest  neighbor  rules  are  but  a special  case  of 


(6.51) 


linear  ordering  rules  ,that  is,rules  defined  by  (6.1)  with 

Wx)  = It  Vx)  1 1 ef = J r 

where  the  w^  are  real -valued  Borel  measurable  functions  on 

Fmx{l , . . . ,M  }xF  ,l*J«M,lsi*n,nal.For  (k^) -nearest  neighbor 

rules, it  is  clear  that  w?=  1 if  l*i*k  and  w,^  = 0 otherwise. The 

ji  n ji 

powerful  rule  of  section  5.2  is  an  example  of  a linear  ordering 

n R 

rule  with  w,  = v . We  have  seen  that  PflL  -L  lac)  can 

Ji  ni  1 1 n n ' 

not  be  upper  bounded  by  a function  of  n that  decreases  to  0 as 

n-»®  uniformly  over  all  distributions  of  (X,  0)  and  over  all  the 

sequences  of  weight  vectors  fv  } (just  let  v , = l,v  =0,i>l  ).How- 

ever, it  remains  an  open  question  whether  such  a bound  can  be 

found  with  LD.One  property  that  may  be  exploited  in  trying  to 
n 


r 
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find  such  a bound  is  that  given  Xx  ...,XX  , the 

1 n 

n v x 

w (X/,0  ,x)  If  x_  -,l*i<n,  are  independent  random  variables. 

Ji  i i l = > i 

6 . 6 Proofs 

Proof  of  theorem  6.1 

Let  (X  , S)  ,(X  , 6 ),...,  (X  ,9  ),(X,0)  be  iid  random  vectors 
0 0 11  n n 

distributed  as  (X,  6). Note  first  that 


and 


ElLn1  = Et(Pl\y#|Vn))2) 

*Etptv  ,x/%;  ev  y8iv„)> 

n 0 n 

-Et(1-‘n80<Vn-X0»(1-6n,(Vn’X»)' 

- ®t(I-.n,(Vn.»)»-Tn,i(Vnl.Xl»J. 

D 2 _2t^> 

E{(En  ) )=E(n  £l,  , ,} 

" 1 V.x/  V 

n i 

l9vri,Xl^  Svri j ■ x j ^ 6J  ‘ ' 


+ n'2E{V  t, 


= „-1e(1-6  (V  Xj)) 


MH/nlEld-^  (Vnl.X1))U-?n  (Vn2.X2))) 


With  6'  = 1- 6 ,6'  = 1-6  , we  obtain 

n n n n 


Et(LnD-En)2)  - E (6;  9o(Vn  ,X0)  «n  e(Vn  .XH  ? 9Wnl , X,)  ? ^ X2) 


-2  Wvn-x)5i1.,(vnrxi)> 


8 P'1  Et*n«,(VnrXl>‘Sii9,1Vnl'Xl)  8ne.(Vn2,X2) 


w « 


j 
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The  last  term  is  clearly  upper-bounded  by  1/n.  Now, let  a,b€ 

{0.1, 2, c}, let  (X  ,8>(X,8),lpt  V =(X  , 8 ) , (X  , 8 ) , (X  , 0 ) 

1 c c ab  a a b b 3 o 

(X  , 0 ) and  let  V =(X  . 8J  , (X_  , 0») , . . . , (X  , 9 ) .With  this  notation, 
nn  ddaoo  n n 

we  have 

E«LnDVl  ‘ 1/n+  Ei6ne0<V12-X0>5ne(V12-X> 

* \ 9)  (V2  'Xl»  92 <vi  • x2>  - 2 6n  e(V12 ^ 0, (V2  ' Xl>! 

WV12'X)WVX1>1 


and 


- £l  VV'Xl’ VV°C'X2’' VV°2,Xl>  VV°'X2)  1 

-Et‘;6l(V0c’Xl)6;e2(V0c’X2|-r«1IV0'Xl,6n92(V0c-X2> 
tSiei(V0-X1)6;S2(V'X2»-6;ei(V02’Xl)6n.2(V0c'X2> 
+ 5;ei(V02'Xl»5;92'V0c-X2»-^,1(VO2'Xl>rU2<VO’X2)  1 
* 3EH6n9(Vl2'X>-rn9<V2'X)  l> 

- 3Eti«n9(v12-»-‘n9<v2-»n  • 

Further, in  a similar  fashion, 

El?;9l(v2-Xi,rk92<vrx2>-  1 

- E(!’„9l<V2'Xl)S’n92(Vl'X2»-5n9l(Vc2’Xl)?n92(Vc'X2)  > 

“ E t «n  e, (V2 ' X 1>  ^ ,2 < V!  • x2> * 9 2 (V2  • X 1>  6n  8, (Vc  1 ' X2> 

* ?n  9l  (v2  ■ Xl»  ««2(Vc  1 • x2>  - s;  9[  (vc2 ' Xl» ,2(Vc  ! ■ x2' 

* Sn9l(Vc2'Xl»6„92(VcrX2»-  'U1«Vc2-Xl»S'ne2'Vo-X2»’ 
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*3E{|6„,(V12.X)-rn,(V2.X)  |). 

This  concludes  the  proof  of  theorem  6.1. 

Q.E.D. 

Proof  of  theorem  6 . 2 
By 

'VC!  ‘ lL„‘ElLn  lT„'l+  . 

lLn-EtLnM=  fPt*V  y»|V„)-PteT  y«|Tn)| 

n n 

«Btl»ne<vn.»-r,(T„.)o|  |vn) . 
E‘(E0T„^nH)2|Tn»tl/4V 

the  cr-inequality  and  the  Cauchy  inequality  for  conditional  ex- 
pectations, we  have, 

E « V1"  »2  5 ‘ 2(E«VE  (LnH  lTn>  »2 ) + Et,L„H-E  lL„H  lT„))2 )) 

‘2E<(EtlV<Vn’X>-rn,(Tn'X)lM2)  * 1/2sn 

‘2Etl5„.(Vn-X)-‘n.(Tn'X)  lV  ‘/2sn  • 

Next,  2 

P{|E{L”|Tn}-L^|i  « |Tn}  « 2e"2Sn*  wpl  ,«>0, 

implies  that 

PtlVLnH!*«) 

‘ P (|Ln-E(LnH  |In)  |*./2  ) ♦ P(  |tnH-PtI.nH  |T„)  |*  «/2] 

2. 

* 2='*"'  n +H/«2)E((E(|Jne(Vn,X)-T„e(Tn,X)  | |Vn))2) 


' ’laWCii  . 
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‘ 2 e-s„,/24-E(|6n#(Vn,X)-4nt(Tn.X)|2). 


Finally,  (6.23)  is  proved  as  (6.22) 

Proof  of  theorem  6.3 

Let  M=2  and  let 


w“<*.y>  - I{||x_y||  j , lfij  <M,x,  y € Fm  . 


O.E.D. 


Then 


((1,0) 


6n(vn'x) 


:5n(3,x) 


lf  inl(Vn'X)>'rn9(V«'X) 

nl  n nZ  n 

lf  ^l(Vn'X><Y„9(Vn'X) 

nl  n nz  n 

if  Y ,(V  ,x) 

nl  n nl  n 


where  for  every  x,  § (3,x)  is  an  arbitrary  probability  vector. Let 


^ put  mass  l/2n  at  l,2,...,2n  and  let  ^ put  mass  0 at  those 

x’s  for  which  5^  (3  ,x)z  fc.Let  n2  put  mass  0 at  those  x's  for 

which  c ,(3,x)<  $. Then, if  N.k  is  the  number  of  X.'s  with  X =k 
*nl  J n i i 

and  j , we  have 

L =—  V IfMk  Mk  . Max(§  (3,k).5  ,(3,k)  ) 
n « {N , „=N0^  = 0 j nl  nZ 


2 n k=l  i”ln""2n 
i Jn/2n  i ^ 


and 


r R n 

L = 0. 
n 


m 


The  same  proof  carries  through  if 

where  is  any  countable  partition  of  Rm,  provided 

that  arKl  l*j  Puts  mass  l/2n  ln  Bi',,*,B2n  and  mass  0 
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2n 

outside  U B.  . 
i=l 

Q.E.D. 

Proof  of  theorem  6 . 4 

Assume  without  loss  of  generality  that  na3  and  let  p=l/n 
and  pick  a number  a from  (l/(n-l) , l/(n-2) ) .Construct  a two-step 
rule  with 


w "(y,x)  = 


a -i'2' 


,x,ygR  . 


Let  tt^p.tt^I-p  and  let  ^ and  ^ both  put  mass  1 at  0.  Let 

N be  the  number  of  X. 's  with  9=j  ,j=l  ,2  .Notice  that 
jn  i i 

N +N„  =1  and  that  the  equality  N,  =N„  a cannot  occur  by  the 
In  2n  In  2n 

choice  of  a.  Since  X=X ,=X  =...=X  =0  wpl.we  see  that  wpl, 

12  n 


V ,X" 
n 


1 lfNln>N2na 
0 lfNln<N2n. 


If  N =1  and  N-  =n-l,then  L =p  in  view  of  (n-l)a>l  .Because 

1 n 7n  n 


2n 

3 ( 

D 


'In  — - u 

(n-2)a  < 1 , it  is  obvious  that  in  that  case  .L^  =(n-l)/n. Thus , 


P(|L  -LD|a(n-l)/n-p}  = P(|L  -L°  |a(n-2)/n) 
' n n ' n n 


-PtNln=l}=( 


)p( l-p)n~ 1 a np(l-p)n  = (l-p)n 


a e"np/(1'P)  >e"2 


because  p<$.  Inequality  (6.29)  follows  from  this  and  (n-2)/na 
Further , 

E(0.n-l^)2l»PULn-LnD|*i)/9  > l/9e2. 

Notice  that  the  same  is  true  if 
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wn(y,x)  = l/(l+||y||)A  J- 1.2  ; x,y€Fm, 

where  A > 0 , provided  that  n1=p,rr2=l-p,and  that  and  ^ put 
mass  1 at  0 and  z respectively, where  z is  chosen  such  that 
l/(H-||z||)£=a. 

Q.E.D. 

Proofs  of  theorems  6.5  and  6.7 

The  mathematical  machinery  needed  to  prove  these  theorems 
is  rather  heavy. To  see  what  is  going  on, we  extracted  the  follo- 
wing lemmas  that  can  be  of  separate  interest  to  the  reader. The 
lemmas  are  special  versions  of  the  Berry-Esseen  central  limit 
theorem  (Feller,  1966  ; Osipov  and  Petrov,  1967  ;Hertz , 1969  ) . 

Lemma  6.1.  If  a,bcF  with  a<b,if  Y , . . . ,Y  are  independent 

" All  2 

identically  distributed  random  variables  with  variance  o < <*>  and 


jYj-EtYj  ) |<g  < ® wpl,then 


P{  a < (^2  yJ  /ojn  *b}  i {b-a)/j2v  + 2CQq/oJn 


where 


CQ  = 1 + 14/3V2tt  + 64  C2 


and  is  the  universal  constant  in  the  Berry-Esseen  theorem 
(e . g ., according  to  Zolotarev  (1966)  ,C^<  1 . 322  ) . 

Proof  of  lemma  6.1 


*(x)  = (1/7  2tt) 


dt  ,xgF. 


From  Hertz  (1969)  we  know  that 


7 |Pl(S(y>  -E  {Yt}))/c7n  }-*(x) 
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<|c7Wn)3)  / n f x2dF(x) 

'°  '0  J {x:  |x  |>u} 


du 


where  F is  the  distribution  function  of  Y^EtY^.We  bound  the 
last  expression  by 

CQngc  /a  n s Cog/crVn  • 

and  lemma  6.1  follows  from  this  and  $(b)-$(a) £(b-a)/V2n  . 

Q.E.D. 

Lemma  6.2.  If  Y,,...,Y  ,Z.  , Zn  are  independent  identically 

distributed  random  variables  taking  values  in  [-1 , -c]u{0}u[c,  1 ] 
where  0<cil,then 

ns  n 

P{  sgn  (V  Yi  + 2^zj ) / sgn  (YJ  YA  > ) s s C^/c^/m-  1 


where 


J=1 


C2  = 4/7  n + 4CQ72 


and  where  sgn(.)  is  the  sign  function  , 

1 if  u>0 

sgn  (u)  = 0 if  u<0 

| if  u=  0 . 

Proof  of  lemma  6.2 
n 

Let  Y= 


iv 


n s n 

1 ‘(Z/0)  'V^j 


,S„=>  . Y,  and 


1 


Let  A denote  the  event  (sgn(SY+Sz)  / sgn(Sv) } .Obviously, 


8 n 

P {A}  J P {Y=y )P {Z=z }P {A | Y=y , Z=z  }. 

z*l  y=0 


Let 

2 2 2 1 
X <p  c /2  ,then 


J = pE{Y1  | Y^ 0 } where  p=P {Yj/ 3 } . If 
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aQ2  =E{(Y1-E{Y1|Y1^0})2|Y1/0} 

= E {Y2  lYj/O)  - (E{Y1  |Y^0})2  * c2-c2/2  = c2/2. 

In  that  case, by  lemma  6.1, 

P{A|Y=y,Z=z}*P{|SY|*z  | Y=y  } 

« Min  1 1 ; (2CQ+2z/V2^)/(a0Vy)) 
s Min|  1 ; (2CqV2+  2z//n  )/(c/y ))  . 

If  \2ap2c2/2,then,by  Chebyshev's  inequality,  if  q=x/p=E  {Y1  (Y^O } , 

P {A |Y=y ,Z=z  } < P{SY+Sz<0,Sys0  |Y=y,Z=z  } 

+ P{Sy+Sz^0,SY<0  |Y=y,Z=z  } 

* P{Sy+Sz-(y+z)q< -(y+z)q  |Y=y,Z=z  } 

+ P {Sy-yQ  <-yq  |Y=y,Z=z  } 

< 2 c 2/  yq2  * 4/yc2 . 

2 2 — — — 

Note  that  if  4/yc  <l,then  4/yc  &2/cJy  * 2J2  CQ/cJy  so  that, 
since  z 2 1 in  the  summation, 

P{A | Y=y,Z=z } t z Min/l  , C3/2c/y) 

' s 

where  C3=  2/^/tt  + 2CqN/2  £ C2/2  .Noting  that  zP[Z=z}  = sp.we 
have, when  L is  a binomial  random  variable  z_  with  parameters 
(n+1)  and  p, 

P{A}*£  8 0 py+IU-P)n"yMin(l  ; G/c,/y\ 
y=0  Y 

-*t0  SiO  py+I(1-p)n''’  mi4  ••  v*^) 

* (s/(rH-l))  E|Min/  L+l  ; C3(L4-1)/c/l)  j 
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£ (s/(n+l))  eJ  Min(  L+l  ; (VL  + IJCj/c)  j 
£ (s/(n+l))  Min|  E(L}+1  ; (JE  {L}  + l)Gj/c) 

£ (sGj/c(n-t-l))  ( 1+  J (n+l)p  | £ 2sGj/c^/n+l 

where  we  used  Jensen's  inequality,  l£,/n+l  and  p£l. 


Q.E.D. 


Lemma  6.3.  Let  {6  } be  a two-step  rule, let  6 be  the  holdout 
1 n n 

decision  function  corresponding  to  6 and  let  6 have  ratio  range 

n n 

c .Then 
n 


(tt 


r 


sup 

. , TT. 


M'*l 


I •••  t 


k*M) 


sup 

xeBm 

l£j£M 


euvvx) 


VVx)l’ 


£ C ' (M-l)c  s /Jn-s  +1  . 

Z n n n 

If  6^  is  the  deleted  decision  function , then  the  inequality  holds 
n 

with  T =V  , and  s =1. 
n nl  n 

Proof  of  lemma  6.3 

Let  V **  flX6d  and  l6t  x€Rm-Con8ider 

first  the  holdout  decision  function  't  .The  proof  for  the  deleted 

n 

decision  function  is  similar. 


M n *.»  n-s 

£P{U  { sgn(V  Y U))  / sgn(V  Y.  J ) }} 
J=2  1=1  1 i=l 


where 


l0,-It,l.irr«l-x>-I(Vnw)<xi 


w"(X,,x)  , 1 £i£n,2  £j  £M  . 


We  used  the  fact  that  by  the  definition  of  and 


A. 
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M 

n 

J=2 


(s»n<£  Y.0  Wg"  Y.' 
i=l  i=l 


(j) 


)}n[6nl(vn.x)Anlnn,x))j 


Is  empty. Since  Y^ Y ^ are  Independent, identically  distri- 

buted  random  variables  taking  values  in  {0}tj[l  .c^lut-c  , -l] 
(without  loss  of  generality) , we  thus  have  by  lemma  6.2, 


E<l,nl(V’d-ynl<V*>U‘<M- 


1)C2SnCn/^"-V1 


Q.E.D. 

To  show  theorem  6. 7, we  use  (6.21)  and  lemma  6.3  as 
follows . 

E«VLnH)2>  * 1/2V  2Etl«ne(V»-?ne(Tn'X>l) 

‘ 1/2Sn+  2“>m  E(lyVx,-?„J(I„'x|l! 

KjiM 

< 1/2 s + 2C.(M-l)c  s /J n-s  +1  . 
n 2 n n n 

This  bound  is  uniform  over  all  distributions  of  (X,  0) . Similarly , 

(6.35)  follows  from  (6.23)  and  lemma  6. 3. Because  6 and  the  de- 

leted  decision  function  6^  are  symmetric, theorem  6.1  and  inequality 

(6.15)  can  be  used  in  combination  with  lemma  6.3  (where  s =1 

n 

and  T =V  .)  to  prove  (6.30)  and  (6.31). 
n ni 

O.E.D. 

Proof  of  theorem  6.6 

Let  M=2  and  let  n be  an  even  positive  integer.  Let  tt  =tt  =£, 

let  u,=Uo  and  let  0<c<i.lt  is  obvious  that  with  any  6 ,L  = £. 

12  n n 

Consider  for  instance  the  decision  function  6 with  ratio  range 

n 

c =1  that  is  constructed  as  follows. Let 
n 

w"(y,x)  = 1 ,y,X€Fm;l*J*M, 


I 


and  let  N,  be  the  number  of  observations  for  which  0=1. A two- 
ln  i 

step  rule  that  uses  tnese  ,1=1,2,18  defined  by 

11  , if  N ln>n/2 
i ■ lfN^n/2 
0 , If  Nln<n/2  . 

The  deleted  decision  function, constructed  as  outlined  in  section 

6. 3, is  such  that  L^=l  if  N.  =n/2.Thus, 
n In 

E{(Ln-LnD)2}  s (l/2)2P{Nln=n/2)  = 1^)  2_n  > 1/4 
where  we  used  an  inequality  of  Mitrinovic  (1964)  .Further , 
P{|L^-Ln|S«  ) 2 P{Nln=n/2}  > l//2n  . 

The  same  results  are  obtained  with 

”"(y'x)  ■ I{||y-X||«hnl 

where  h^> 0 , provided  that  ^2  both  Put  al^  their  mass  In  a set 

S which  has  a diameter  that  is  smaller  than  h . 

n 

Q.E.D. 


Proof  of  theorem  6.8 

Let  it . ,uw  be  fixed. Notice  that  for  ali  «>0, 

1 Ml  M 

I*  . ) * P(  |LnR-LnR'  |,  ,/2  ) * P { |LR'-tn  I * «/2  ) 

2 

*2e'n'/2+P(|LR'-LnU./21. 

We  will  upper  bound  P{|L^  -Ln  |a  «/2}.Let 

Ln-RleV  ./>  |V„)-Etl-»nj(vn.X)|v„).l«l«M, 

n m 

where  X has  probability  measure  ^ in  K .Let 


I 


i 
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N.  =Y'  Iffi  . , , l*j  <M , 

Jn  fel  t®i=j  * 

be  the  sample  sizes  of  the  M states. Let  further  for  lsj*M, 


and 


■„R,)=  £i(Vj}<1-‘»i(v*i»  /Nm 


"in  * Nln/n  ' 


It  should  be  clear  that 
M 


r R'  r V1  / T R'j  r J \ 

L -L  = ]>  V TTj  L -TT.L  ) . 
n n ]n  n j n 


M 


j=l 

If  a>0  and  l-2a>0,then 

P^Ln"Ln  Is  e/2  5 s Pt  U t lnjn'nj  I*  as/2M)} 

M R, 

+ P{  nx  {|TTjn-n.  |<ae/2M  },  |Ln  -Ljic/2  } 


s 2M  e 


2 7 2 

-na  « /2M 


M M R.j  , 

+ p{  n (Itt.-tt  |<ae/2M),C  " (L  J-L^)  |2(l-a)  «/2  } 

j=l  Jn  1 j=l  J 


i 2Me 
M 


2 2 2 
-naV/2M 


xvx.  Di{  \ 

+2j  p(  l"jn'"l  l<a«/2M  ■ lLn  ■Lnli,1-",*/2M")) 


< 2M  e 


2 2 2 

-na  « /2M 


M R I j j 

+ P{Njn2  ne(I-2a)/2M  , |Lr  J-L^  (1-a)  c^MtTj  } 


where  we  used  Hoeffdlng's  inequality  (1963)  and  the  fact  that  we 
can  assume  that  in  the  last  tw<?  events,  (1-a)  €/2Mtt. * 1 (otherwise 
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the  probabilities  of  the  said  events  would  be  zero  in  view  of 

|LR’j-Lj  Is  1 ) . But 
1 n n 1 

t Is  a6/2M  (T1.s  (1-a)  */2M  }c  (N ,nsne(l-2a)/2M  ] 

which  explains  the  last  step  in  the  chain  of  inequalities . Next, 
P{Njn*  nc(l-2a)/2M  , |LR'j-lJ  |*  (1-a)  c/2Mn.  } 

* ess  sup  P{  |LR  U (1-a)  e/2Mn.  |N,  1. 

N.  2 nc(l-2a)/2M  n n 1 jn 

in 

We  will  show  that  for  all  P>0  and  k>0,k  integer, 

P{|LR  -L^|2  3 |Njn=k  ) s 4s(.tf2M,2k)e~k0  /2 
2M 

where  M is  the  class  of  all  2M-fold  intersections  of  open 

m 1 

or  closed  linear  halfspaces  in  1R  and  where  the  function  s is 
defined  in  section  3. 2. In  particular , from  lemmas  3.2  and  3.6  we 


know  that 


s(.X'2M,2IO  s (s(jr',2k))  ZM 

* (l+  (2k)m  + 1 1 2M  s (lt2k)2(m'*1)M 


Combining  bounds^with  9=(l-a)c/2Mrr,  and  k=(l-2a)n«/2M,  yields , 
upon  noting  that  ttsI  for  all  j, 

P(|LR-L  «)  < 2e_Ils  /2 -t-  2M  e'na  8 /2M 

^ ft  //l  r»  , 


+ 4M(l+(l-2a)ne/M)2(m 


..  /ll-2a)nc\/ll^a)c\  J 
'+  1)M  \ 2M  A 2M  / / 


from  which  theorem  6.8  follows  if  we  let  a=l/3. 

We  need  only  show  that  for  all  B>0  and  k>0,k  integer, 

P(|LR'j-LJ  |2p  IN  =k]  « 4s(jr2M,2k)e"kp2/2 
1 1 n n 1 > Jn  J 


1 


„%(x)  = v w)i'V  Vx)  ■“)IM  ■xcR”1' 

J £6 

and  let  denote  the  empirical  measure  based  on  the  k obser- 
vations X,  for  which  e.  = l.For  given  V .define  the  following 

* m ^ 

subsets  of  F , 


A = u (x  : w%(x)<  w^x)  1 
j=2  J 


and, with  OsN<M  and  {i1#...,i  }c  , 

A i = tx  : w ncp(x)=w  n m(x)  = . . .=w  n m(x) 
lll NJ  *1  lN 


> sup  W.  cp(x)  } 


Then, 


Ln<1  - “/V’-ik'V 


11  sSsets  J of  ^ nl 


(J*  ,x))  Hj(dx) 


{2 M} 


= W~*lk(V 


’ „ £ T f S{V-“lk(V)?nl(I*'0) 

all  subsets  J of 
{2 M} 

where  J*  is  the  subset  obtained  by  taking  the  union  of  {1}  and 

fi  , . . • ,i.,],the  J-th  subset, Notice  that  we  used  the  fact  that 
1 1 N 

? (J,x)=§  (1,0)  for  all  x and  all  J.So, 
n n 

lL„1-Ln'1|  ‘ K(V  - “lk<V  I 
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+ 2M_1  sup  |j*  (AJ-*  (A)  | . 

all  subsets  J of 
{ 2 M } 

Because  1 t*x (AQ) -|fclk(A0)  1=  } I and  Ao°  iS  a Set  from 

M-l  2M 

M c .Jr  , and  because  for  all  subsets  J from  [2  , . . . , M },with 


Mi, 


’V* 


A.  = n {x:  w ncp(x)>  w"cp(x)  1 n (x  : w n«p(x)  sw  %(x)  1 

j^J*  J Jt<=j  * 

n (x:w  n(p(x)<wVx)) 

£€J  * 

. M-N-1+2N  M+-N-1  2 (M-l)  2M  L 

Is  a set  from  .*r  c Jif  ,we  thus  have 

for  all  V that 
n 

lLn‘Ln  V"  sup  l»i(A)-»lk(«l 
A**2*1 

and, by  (3. 15), 

P(|L1-I.R'1|2Bl«48tx2M.2k)e-k(e/2Mi2/8. 

' n n 1 

The  proof  of  the  last  inequality  of  section  6.4  is  similar 

— n 2 73 

if  one  replaces  t by  2t  and  omits  the  term  2e  € in  the  bound. 
In  fact, we  obtain  the  intermediary  result 


2 2 2 

or  ir  t R'  i , -2na  c /M 

p{lLn-Ln  la«  ) * 2Me 


4M  |l+2(l-2a)n«/Mj 


2 (m 


/(l-2a)nc\/(l-a)\ 

'+  1)M  \ M / \ M / ^ 


2M+3 


Q.E.D. 


Proof  of  theorems  6.11  and  6.12 

Theorems  6.11  and  6.12  follow  immediately  from  theorems 

6.1  and  6. 2, from  the  symmetry  of  6 and  6 ,from 

n n 
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E{^ne<Vk'X>-5n»(T^'X)  '}  ‘P“ne(V;-»^t„.(T;-»  > ' 

and  lemma  6.4  given  below. 

Lemma  6.4.  Let  M=2  and  let  {6  } be  a {k  }-nearest  neighbor 

rule. Then, with  the  holdout  decision  function  6 , 

n 

p(6ne(v’ ,»  } * (V^).„;r/n 

and  with  the  deleted  decision  function  T , 

n 

P<6n«(Vn'X)/  1 ‘ • 

Proof  of  lemma  6.4 

We  first  show  that  if  Y is  a binomial  random  variable  with 
parameters  n and  £,then  for  any  integer  asl, 

P{  |Y- n/2  |<  a/2  ) *j(a*-l)2 , n even 
(a  2/2  /J2nn  , n odd 
< 2(a+l)//nn  s 4a// nn  . 

Indeed, if  n is  even, na 2 , then  the  central  term  in  the  binomial 
expansion  is 

2 " (n/2)  56  <2/V2*m)  e(1/12n  -2/<6n+1»  <2 

by  Feller's  approximation  for  n!  (Feller,  1968) . If  n is  odd  and  ns3, 
then  the  maximal  term  in  the  binomial  expansion  is 
, — (n—  1)  /n-l\ 


-n  / n 


© 


<2 


^j<  2//2n  (n-1)  < 2//nn  . 


This  proves  the  aforementioned  inequality. 

Now, let  M=2 , let  (XX,  6X,ZX) (XX  9* ,Z*)  be  the  ordered 

111  n n n 


permutation  of  where  x^Fra  and  let 


t 


Then, 


N;n  5i  } ,j  1,2‘ 

P^6ne(V;'X)^ne(Tn'X)i 

En  rrz  =ZXfj'lNin"N2nl<j} 
x j=l  i=l  jg=l  1 n-sn+i  ln  2n 

. ...  f £i£i 


x )=1 


A\ 

where  y ) = 0 if  j>kn.To  see  this, notice  that, conditioned  on 

s k 

()sr « ■ \x +i m » ■the evants tf|  it  ip  =zx}=)> 

n n n i=l  jt=  1 l n-s^+i  x 3 

and  { lNin_N2n  1*^  are  independent. Also, N x is  conditionally 
binomial  with  unknown  probability  parameter  p and  known  counting 
parameter  . It  is  clear  that  for  all  Jil, 

P(|NX-„/2UJ/2  |)£  +1.<+1.ZX  ) X 4J/V-T 

n n n 

by  considering  the  worst  case, that  is,p=£.The  probability  of  the 

former  event  is  the  probability  that  out  of  an  urn  with  n balls , 

k^  black  ones  and  n-k^  red  ones, we  pick, without  replacement, 

exactly  j black  balls  and  s -J  red  balls. 

n 

From  the  properties  of  the  hypergeometric  distribution 
(Roussas , 1973)  we  know  that 


s /k  wi-k 


rn  V -1 


J = k s /n  . 
n n 
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j I 

L 


Thus , 

P{6  JV  .X)/T  0(r  ,X)  } «(44/n)  s Jk~  /n  . 

1 ne  n no  n n n 

The  second  part  of  lemma  6.4  follows  at  once  by  letting  sn=l« 

Q.E.D. 

Proof  of  theorem  6.13 

Let  M=2,let  tt=tt  =£  and  let  * and  y,  both  put  mass  1 

1 C.  L Ct 

at  O.Let  {6  1 be  a fk  1-nearest  neighbor  rule  with  k =n  for  all  n. 
n n n 

Let  N be  the  number  of  observations  X,  for  which  0 = 1 and  let 

ln  1 D 1 

n be  even. It  is  clear  that  L = % and  that  L =1  if  N,  = i.Thus, 

n n In 

E{(Ln-L^)2}  * (l/2)2p{Nln=n/2  } 

=02'n>,/v" 

and 

P{|Ln-LnD|2i}  * P{Nln=n/2}>l/V2^  . 

Q.E.D. 

Proof  of  theorem  6.15 

Let  M=2  and  let  {6  1 be  a fk  1-nearest  neighbor  rule  with 

n n 

a corresponding  deleted  decision  function  6 .nal.Let  0 , 

n Vjj  * j 

9^  ^ , l<l<n,be  independent  random  variables  (conditioned  on  V^) 

pt*v  ,X=tlVn’-5n)(Vn'Xl)'1‘)‘M' 
n 1 

P‘\1.x')|Vn)-rnJ(Vol.Xl).UJ«M. 
ni  i 

Then, 


with 

and 


« 


f 
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and 


n 1 


ni'  i 


e!<-V21 


<2E((^-Ln,  ) + 2E((I{Sv  -I{6  , , 


n 1 

s 2 ( 1 f 6 k )/n  + 2P{8  /0  ) 

n Vn'Xl  Vnl'Xl 


nl'  1 


S 2(1+6 kn)/n+  2 sup  PtlN^-N^ls  !} 
x 

£ 2(1+6  kn)/n  + 8/>irn 

where  we  used  (6.43)  and  the  bound  derived  in  the  proof  of 

lemma  6. 4, and  where 
kn 

N*=I;  I..X_  VJ=1#2. 

jn  i=l  lei_1  J 

If  theorem  6.11  is  used  instead  of  (6. 43), then  the  factor  6k  in 

n 

the  bound  can  be  replaced  by  ZAJk^/v  . 


Q.E.D. 
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